Coffee Cup Background Resolution Benchmark

AI image generators nail the hero subject and give up on everything else. This benchmark measures exactly that — generate a normal scene and see if the background coffee cups are real objects or shapeless blobs.

7 Quality Metrics YOLO + OWL-ViT Detection CLIP Semantic Scoring Open Source

Models Tested

Unique submissions

Images Evaluated

Total benchmark runs

Top Score

Highest avg quality

Detection Rate

Avg bg cup detection

Leaderboard

Rank Model Quality Detection Semantic Resolution Structure Bar

How It Works

1

Generate Images

Use any AI model with the standard prompts from our prompt set. Each prompt places coffee cups in the background of realistic scenes.

2

Run Benchmark

Our pipeline detects background cups using YOLO + OWL-ViT, then evaluates each on 7 quality metrics including CLIP semantic coherence.

pip install -r requirements.txt python scripts/run_benchmark.py --image-dir ./outputs/
3

Submit Results

Export your results and submit to the leaderboard. Results are validated and ranked by overall quality score.

python scripts/run_benchmark.py --submit \ --model "my-model" --results results.json
4

Compare

See how your model stacks up across all metrics. Identify specific weaknesses in semantic understanding, resolution, or artifact generation.