Skip to content

👑 Interpretation Performance

Rank Model Score
1 Claude-3.5-Sonnet 0.693
2 GPT-4o 0.519
3 Gemini-1.5-Pro 0.434
4 Llama 3.2 90B Vision 0.401
5 Baseline 0.218

Sub-Task Performance

Performance across individual Sub-Tasks in this domain.