
Abstract:
A common and controversial use of text-to-image models is to generate pictures by explicitly naming artists, such as “in the style of Greg Rutkowski”. Because the original prompt is usually unavailable, online platforms lack a reliable way to decide whether an uploaded image should be filtered for including artist names in its prompt—often required when the artist has not given consent.
We introduce a benchmark for prompted-artist recognition: predicting which artist names were invoked in the prompt from the image alone. The dataset contains 1.95M images covering 110 artists and spans four generalization settings: held-out artists, increasing prompt complexity, multiple-artist prompts, and different text-to-image models.
We evaluate feature similarity baselines, contrastive style descriptors, data attribution methods, supervised classifiers, and few-shot prototypical networks. Generalization patterns vary: supervised and few-shot models excel on seen artists and complex prompts, whereas style descriptors transfer better when the artist’s style is pronounced; multi-artist prompts remain the most challenging.
Our benchmark reveals substantial headroom and provides a public testbed to advance the responsible moderation of text-to-image models. We will release the dataset and benchmark to foster further research.
Committee:
Prof. Jun-Yan Zhu (Advisor)
Prof. Jean Oh
Sheng-Yu Wang