A display-only Artificial Analysis long-context reasoning evaluation.
BenchLM mirrors the published score view for AA-LCR. GPT-5.2-Codex leads the public snapshot at 75.7% , followed by GPT-5 (high) (75.6%) and GPT-5.1 (75.0%). BenchLM does not use these results to rank models overall.
GPT-5.2-Codex
OpenAI
GPT-5 (high)
OpenAI
GPT-5.1
OpenAI
The published AA-LCR snapshot is tightly clustered at the top: GPT-5.2-Codex sits at 75.7%, while the third row is only 0.7 points behind. The broader top-10 spread is 3.0 points, so many of the published scores sit in a relatively narrow band.
117 models have been evaluated on AA-LCR. The benchmark falls in the Reasoning category. This category carries a 17% weight in BenchLM.ai's overall scoring system. AA-LCR is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.
Year
2026
Tasks
Long-context reasoning tasks
Format
Accuracy
Difficulty
Long-context reasoning
BenchLM stores AA-LCR as a display-only row when OpenRouter or Artificial Analysis publishes the exact long-context reasoning card value.
Version
AA-LCR 2026
Refresh cadence
Quarterly
Staleness state
Current
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A display-only Artificial Analysis long-context reasoning evaluation.
GPT-5.2-Codex by OpenAI currently leads with a score of 75.7% on AA-LCR.
117 AI models have been evaluated on AA-LCR on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.