Skip to content

MaCBench

Interpretation

lamalab-org/mac-bench

Interpretation Performance

Rank	Model	Score
1	Claude-3.5-Sonnet	0.693
2	GPT-4o	0.519
3	Gemini-1.5-Pro	0.434
4	Llama 3.2 90B Vision	0.401
5	Baseline	0.218

Sub-Task Performance

Performance across individual Sub-Tasks in this domain.