Skip to main content

Artificial Analysis GPQA Diamond (AA-GPQA Diamond)

A display-only Artificial Analysis GPQA Diamond score.

Benchmark score on AA-GPQA Diamond — June 2, 2026

BenchLM mirrors the published score view for AA-GPQA Diamond. Gemini 3.1 Pro leads the public snapshot at 94.1% , followed by GPT-5.5 (93.5%) and Qwen3.7 Max (92.3%). BenchLM does not use these results to rank models overall.

124 modelsKnowledgeCurrentDisplay onlyUpdated June 2, 2026

The published AA-GPQA Diamond snapshot is tightly clustered at the top: Gemini 3.1 Pro sits at 94.1%, while the third row is only 1.8 points behind. The broader top-10 spread is 3.3 points, so many of the published scores sit in a relatively narrow band.

124 models have been evaluated on AA-GPQA Diamond. The benchmark falls in the Knowledge category. This category carries a 12% weight in BenchLM.ai's overall scoring system. AA-GPQA Diamond is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA-GPQA Diamond

Year

2026

Tasks

Graduate-level science questions

Format

Accuracy

Difficulty

Graduate-level science reasoning

BenchLM stores the Artificial Analysis GPQA Diamond result separately from the weighted GPQA lane so AA refreshes remain display-only.

BenchLM freshness & provenance

Version

AA-GPQA Diamond 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (124 models)

1
94.1%
2
93.5%
3
92.3%
4
92.2%
5
92.0%
6
92.0%
7
91.5%
8
91.4%
9
91.1%
10
90.8%
11
90.5%
12
90.3%
13
90.1%
14
89.9%
15
89.6%
16
89.4%
17
89.3%
18
88.8%
19
88.8%
20
88.5%
21
88.4%
22
88.2%
23
87.9%
24
87.9%
25
87.7%
26
87.5%
27
87.4%
28
87.3%
29
87.0%
30
86.8%
31
86.7%
32
86.7%
33
86.6%
34
86.6%
35
86.1%
36
86.0%
37
86.0%
38
85.9%
39
85.8%
40
85.7%
41
85.7%
42
85.4%
44
84.7%
45
84.7%
46
84.5%
47
84.5%
48
84.4%
49
84.2%
50
84.2%
51
84.1%
52
84.0%
53
82.8%
54
82.7%
55
82.2%
56
82.0%
57
81.7%
58
81.3%
59
81.2%
60
81.0%
61
80.9%
62
80.9%
63
80.9%
64
79.9%
65
79.2%
66
78.3%
67
78.2%
68
77.9%
69
76.9%
70
76.9%
71
76.6%
72
76.4%
73
76.1%
74
75.2%
75
75.2%
76
75.1%
77
74.8%
78
74.8%
79
74.7%
80
73.8%
81
73.5%
82
73.3%
83
72.8%
84
72.7%
85
68.8%
86
68.3%
87
68.3%
88
68.0%
89
67.1%
90
66.6%
91
66.4%
92
65.6%
93
63.7%
94
63.3%
95
63.2%
96
62.8%
97
61.5%
98
59.3%
99
58.9%
100
58.7%
101
57.8%
102
57.6%
103
57.5%
104
56.1%
105
55.7%
106
54.3%
107
51.5%
108
51.2%
109
49.9%
110
48.9%
111
48.6%
113
43.3%
114
42.8%
115
42.6%
116
42.4%
117
41.7%
118
39.9%
119
37.4%
120
28.1%
121
27.7%
122
26.3%
123
26.1%
124
25.7%

FAQ

What does AA-GPQA Diamond measure?

A display-only Artificial Analysis GPQA Diamond score.

Which model scores highest on AA-GPQA Diamond?

Gemini 3.1 Pro by Google currently leads with a score of 94.1% on AA-GPQA Diamond.

How many models are evaluated on AA-GPQA Diamond?

124 AI models have been evaluated on AA-GPQA Diamond on BenchLM.

Last updated: June 2, 2026 · BenchLM version AA-GPQA Diamond 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.