Skip to main content

Artificial Analysis Long Context Reasoning (AA-LCR)

A display-only Artificial Analysis long-context reasoning evaluation.

Benchmark score on AA-LCR — June 2, 2026

BenchLM mirrors the published score view for AA-LCR. GPT-5.2-Codex leads the public snapshot at 75.7% , followed by GPT-5 (high) (75.6%) and GPT-5.1 (75.0%). BenchLM does not use these results to rank models overall.

117 modelsReasoningCurrentDisplay onlyUpdated June 2, 2026

The published AA-LCR snapshot is tightly clustered at the top: GPT-5.2-Codex sits at 75.7%, while the third row is only 0.7 points behind. The broader top-10 spread is 3.0 points, so many of the published scores sit in a relatively narrow band.

117 models have been evaluated on AA-LCR. The benchmark falls in the Reasoning category. This category carries a 17% weight in BenchLM.ai's overall scoring system. AA-LCR is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA-LCR

Year

2026

Tasks

Long-context reasoning tasks

Format

Accuracy

Difficulty

Long-context reasoning

BenchLM stores AA-LCR as a display-only row when OpenRouter or Artificial Analysis publishes the exact long-context reasoning card value.

BenchLM freshness & provenance

Version

AA-LCR 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (117 models)

1
75.7%
2
75.6%
3
75.0%
4
74.3%
5
74.0%
6
74.0%
7
74.0%
8
73.3%
9
72.8%
10
72.7%
11
72.7%
12
70.7%
13
70.7%
14
70.3%
15
69.7%
16
69.7%
17
69.7%
18
69.7%
19
69.3%
20
69.3%
21
69.3%
22
69.0%
23
68.7%
24
68.7%
25
68.0%
27
67.7%
28
67.3%
29
67.3%
30
67.3%
31
67.0%
32
66.7%
33
66.7%
34
66.3%
35
66.3%
36
66.0%
37
66.0%
38
65.7%
39
65.3%
40
65.3%
41
65.3%
42
65.3%
43
65.0%
44
64.7%
45
64.3%
46
64.0%
47
63.7%
48
63.7%
49
63.3%
50
63.0%
51
62.7%
52
62.7%
53
62.3%
54
62.0%
55
61.0%
56
61.0%
57
61.0%
58
60.7%
59
60.7%
60
59.3%
61
58.3%
62
58.0%
63
57.7%
64
55.7%
65
55.7%
66
54.7%
67
54.7%
68
53.3%
69
51.0%
70
50.7%
71
48.3%
72
48.0%
73
46.7%
74
46.0%
75
46.0%
76
45.9%
77
45.0%
78
44.7%
79
44.7%
80
44.3%
81
43.7%
82
42.3%
83
39.0%
85
34.7%
86
33.0%
87
33.0%
88
31.3%
89
30.7%
90
30.7%
91
29.0%
92
28.0%
93
26.3%
94
25.8%
95
25.0%
96
24.3%
97
22.0%
98
21.0%
99
19.0%
100
17.0%
101
15.0%
102
9.7%
103
8.0%
104
7.3%
105
6.7%
106
6.3%
107
5.7%
108
5.3%
109
4.0%
110
0.0%
111
0.0%
112
0.0%
113
0.0%
114
0.0%
115
0.0%
116
0.0%
117
0.0%

FAQ

What does AA-LCR measure?

A display-only Artificial Analysis long-context reasoning evaluation.

Which model scores highest on AA-LCR?

GPT-5.2-Codex by OpenAI currently leads with a score of 75.7% on AA-LCR.

How many models are evaluated on AA-LCR?

117 AI models have been evaluated on AA-LCR on BenchLM.

Last updated: June 2, 2026 · BenchLM version AA-LCR 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.