Skip to content

👑 Toxicology

Rank Model Fraction Correct
1 Claude-3.5 (Sonnet) 0.718
2 GPT-4o 0.713
3 GPT-4 0.658
4 Claude-3 (Opus) 0.64
5 Llama-3-70B-Instruct 0.632
6 GPT-3.5 Turbo 0.558
7 Claude-2 0.538
8 Claude-2-Zero-T 0.529
9 Llama-3-8B-Instruct 0.527
10 Phi-3-Medium-4k-Instruct 0.518
11 Command-R+ 0.501
12 Gemini-Pro 0.485
13 Mistral-8x7b-Instruct 0.409
14 Gemma-7b-Instruct 0.405
15 Galatica-120b 0.348

Leaderboard Plot

The following plot shows the leaderboard of the models based on the fraction of correctly answered questions. This fraction is calculated as the number of correct answers divided by the total number of answers. The leaderboard is sorted in descending order of the fraction correct.