Benchmark profile

Artificial Analysis Long Context Reasoning (AA-LCR)

A display-only Artificial Analysis long-context reasoning evaluation.

Data verified July 23, 2026

Benchmark score on AA-LCR — July 23, 2026

BenchLM mirrors the published score view for AA-LCR. GPT-5.2-Codex leads the public snapshot at 75.7% , followed by GPT-5 (high) (75.6%) and GPT-5.1 (75.0%). BenchLM does not use these results to rank models overall.

1Closed

GPT-5.2-Codex

OpenAI

gpt-5-2-codex

75.7%

Overall 59.1Context 400K

2Closed

GPT-5 (high)

OpenAI

gpt-5-high

75.6%

Overall 58.61Context 128K

3Closed

GPT-5.1

OpenAI

gpt-5-1

75.0%

Overall 53.65Context 200K

155 modelsReasoningCurrentDisplay onlyUpdated July 23, 2026

Benchmark score table (155 models)

Score

GPT-5.2-CodexOpenAI · Closed

75.7%

GPT-5 (high)OpenAI · Closed

75.6%

GPT-5.1OpenAI · Closed

75.0%

Kimi K3Moonshot AI · Closed

74.7%

GPT-5.5OpenAI · Closed

74.3%

GPT-5.6 TerraOpenAI · Closed

74.0%

GPT-5.6 LunaOpenAI · Closed

74.0%

GPT-5.4OpenAI · Closed

74.0%

MiniMax M3MiniMax · Open weight

74.0%

GPT-5.3 CodexOpenAI · Closed

74.0%

Claude Opus 4.5 ThinkingAnthropic · Closed

74.0%

GPT-5.3-Codex-SparkOpenAI · Closed

74.0%

GPT-5.6 SolOpenAI · Closed

73.7%

MiMo-V2.5-ProXiaomi · Closed

73.3%

GPT-5 (medium)OpenAI · Closed

72.8%

Gemini 3.1 ProGoogle · Closed

72.7%

GPT-5.2OpenAI · Closed

72.7%

GLM-5.2Z.AI · Open weight

71.3%

Claude Sonnet 5Anthropic · Closed

70.7%

Gemini 3 ProGoogle · Closed

70.7%

Claude Opus 4.6 (Adaptive)Anthropic · Closed

70.7%

Claude Opus 4.7 (Adaptive)Anthropic · Closed

70.3%

Claude Fable 5Anthropic · Closed

70.0%

Kimi K2.6Moonshot AI · Open weight

69.7%

Qwen3.6 PlusAlibaba · Closed

69.7%

Gemini 3.6 FlashGoogle · Closed

69.7%

Qwen 3.6 Max (preview)Alibaba · Closed

69.7%

Muse SparkMeta · Closed

69.7%

Gemini 3.5 FlashGoogle · Closed

69.3%

GPT-5.4 miniOpenAI · Closed

69.3%

o3OpenAI · Closed

69.3%

Qwen3.7 MaxAlibaba · Closed

69.0%

Qwen3.6-27BAlibaba · Open weight

68.7%

MiniMax M2.7MiniMax · Open weight

68.7%

Grok 4xAI · Closed

68.0%

Grok 4.1 Fast (Reasoning)xAI · Closed

68.0%

GPT-5 miniOpenAI · Closed

68.0%

Claude Opus 4.8Anthropic · Closed

67.7%

Grok 4.5xAI · Closed

67.7%

Qwen3.5-27BAlibaba · Open weight

67.3%

GPT-5.1-Codex-MaxOpenAI · Closed

67.3%

GPT-5.1-CodexOpenAI · Closed

67.3%

Nemotron 3 UltraNVIDIA · Open weight

67.0%

Claude Opus 4.7Anthropic · Closed

67.0%

Qwen3.5-122B-A10BAlibaba · Open weight

66.7%

Hy3 PreviewTencent · Open weight

66.7%

MiMo-V2-OmniXiaomi · Closed

66.7%

Hy3Tencent · Open weight

66.7%

DeepSeek V4 Pro (Max)DeepSeek · Open weight

66.3%

Claude 4.1 Opus ThinkingAnthropic · Closed

66.3%

Kimi K2.7 CodeMoonshot AI · Open weight

66.3%

GPT-5.4 nanoOpenAI · Closed

66.0%

Gemini 2.5 ProGoogle · Closed

66.0%

MiniMax M2.5MiniMax · Closed

66.0%

Qwen3.5 397BAlibaba · Open weight

65.7%

Qwen3.5 397B (Reasoning)Alibaba · Open weight

65.7%

Claude Opus 4.5Anthropic · Closed

65.3%

Kimi K2.5Moonshot AI · Open weight

65.3%

Kimi K2.5 (Reasoning)Moonshot AI · Closed

65.3%

Gemini 3.1 Flash-LiteGoogle · Closed

65.3%

Qwen3.7 PlusAlibaba · Closed

65.0%

DeepSeek V4 Pro (High)DeepSeek · Open weight

65.0%

Grok 4 Fast (Reasoning)xAI · Closed

64.7%

Grok 4.3xAI · Closed

64.3%

GLM-4.7Z.AI · Open weight

64.0%

Qwen3.6-35B-A3BAlibaba · Open weight

63.7%

Step 3.7 FlashStepFun · Open weight

63.7%

Muse Spark 1.1Meta · Closed

63.3%

GLM-5Z.AI · Open weight

63.3%

InklingThinking Machines Lab · Open weight

63.3%

DeepSeek V4 Flash (Max)DeepSeek · Open weight

63.0%

DeepSeek V4 Flash (High)DeepSeek · Open weight

62.7%

Qwen3.5-35B-A3BAlibaba · Open weight

62.7%

GLM-5.1Z.AI · Open weight

62.3%

Gemini 3.5 Flash-LiteGoogle · Closed

62.0%

Gemma 4 31BGoogle · Open weight

62.0%

Mistral Medium 3.5 128BMistral · Open weight

61.0%

GPT-4.1OpenAI · Closed

61.0%

GLM-5V-TurboZ.AI · Closed

61.0%

GLM-5-TurboZ.AI · Closed

60.7%

MiMo-V2-ProXiaomi · Closed

60.7%

Nemotron 3 Super 120B A12BNVIDIA · Open weight

60.0%

o1OpenAI · Closed

59.3%

Claude Opus 4.6Anthropic · Closed

58.3%

Claude Sonnet 4.6Anthropic · Closed

57.7%

Gemma 4 26B A4BGoogle · Open weight

55.7%

K-ExaoneLG AI Research · Closed

55.7%

Gemma 4 12BGoogle · Open weight

55.3%

DeepSeek-R1DeepSeek · Open weight

54.7%

Step 3.5 FlashStepFun · Open weight

54.3%

MiniMax M1 80kMiniMax · Closed

54.3%

DeepSeek V3.1 (Reasoning)DeepSeek · Open weight

53.3%

Kimi K2Moonshot AI · Closed

51.0%

GPT-OSS 120BOpenAI · Open weight

50.7%

Grok Code Fast 1xAI · Closed

48.3%

Gemini 3 FlashGoogle · Closed

48.0%

Qwen3 MaxAlibaba · Closed

46.7%

Llama 4 MaverickMeta · Open weight

46.0%

Command A+Cohere · Open weight

46.0%

100

Gemini 2.5 FlashGoogle · Closed

45.9%

101

DeepSeek V3.1DeepSeek · Open weight

45.0%

102

Mistral Small 4Mistral · Open weight

44.7%

103

Mistral Small 4 (Reasoning)Mistral · Open weight

44.7%

104

Claude 4 SonnetAnthropic · Closed

44.3%

105

GLM-4.5-AirZ.AI · Closed

43.7%

106

GPT-4.1 miniOpenAI · Closed

42.3%

107

GPT-5 nanoOpenAI · Closed

41.7%

108

DeepSeek V3.2DeepSeek · Open weight

39.0%

109

Mercury 2Inception · Closed

36.3%

110

Nemotron 3 Nano Omni 30B A3BNVIDIA · Open weight

35.7%

111

GLM-4.7-FlashZ.AI · Open weight

35.0%

112

Mistral Large 3Mistral · Closed

34.7%

113

Nemotron 3 Nano 30BNVIDIA · Open weight

33.7%

114

Trinity-Large-PreviewArcee AI · Open weight

33.0%

115

Trinity-Large-ThinkingArcee AI · Open weight

33.0%

116

MiMo-V2-FlashXiaomi · Open weight

31.3%

117

GPT-OSS 20BOpenAI · Open weight

30.7%

118

Gemma 4 E4BGoogle · Open weight

30.7%

119

DeepSeek V3DeepSeek · Open weight

29.0%

120

Mistral Medium 3Mistral · Closed

28.0%

121

GLM-4.6Z.AI · Open weight

26.3%

122

Llama 4 ScoutMeta · Open weight

25.8%

123

Ling 2.6 FlashInclusionAI · Open weight

25.0%

124

Llama 3.1 405BMeta · Open weight

24.3%

125

Ministral 3 8B (Reasoning)Mistral · Open weight

24.0%

126

Ministral 3 8BMistral · Open weight

24.0%

127

Grok 4.1 FastxAI · Closed

22.0%

128

Ministral 3 14B (Reasoning)Mistral · Open weight

22.0%

129

Ministral 3 14BMistral · Open weight

22.0%

130

Claude 3 HaikuAnthropic · Closed

21.0%

131

Nova ProAmazon · Closed

19.0%

132

GPT-4.1 nanoOpenAI · Closed

17.0%

133

Gemma 4 E2BGoogle · Open weight

15.0%

134

Ministral 3 3B (Reasoning)Mistral · Open weight

11.7%

135

Ministral 3 3BMistral · Open weight

11.7%

136

DeepSeek R1 Distill Qwen 32BDeepSeek · Open weight

9.7%

137

Exaone 4.0 32BLG AI Research · Open weight

8.0%

138

Nemotron Ultra 253BNVIDIA · Open weight

7.3%

139

Granite-4.0-H-1BIBM · Open weight

6.3%

140

Gemma 3 27BGoogle · Open weight

5.7%

141

Mistral Large 2Mistral · Closed

5.3%

142

Granite-4.0-1BIBM · Open weight

4.0%

143

LFM2.5-8B-A1BLiquidAI · Open weight

0.0%

144

GPT-4oOpenAI · Closed

0.0%

145

Phi-4Microsoft · Open weight

0.0%

146

LFM2.5-VL-1.6B-ExtractLiquidAI · Open weight

0.0%

147

Sarvam 105BSarvam · Open weight

0.0%

148

Sarvam 30BSarvam · Open weight

0.0%

149

Granite-4.0-350MIBM · Open weight

0.0%

150

Granite-4.0-H-350MIBM · Open weight

0.0%

151

Exaone 4.0 1.2BLG AI Research · Open weight

0.0%

152

Solar Pro 2Upstage · Closed

0.0%

153

LFM2-24B-A2BLiquidAI · Closed

0.0%

154

LFM2.5-1.2B-ThinkingLiquidAI · Closed

0.0%

155

LFM2.5-1.2B-InstructLiquidAI · Closed

0.0%

The published AA-LCR snapshot places GPT-5.2-Codex first at 75.7%. The third row is 0.7 points behind. The broader top-10 range is 1.7 points, so many of the published results sit in a relatively narrow band.

155 models have been evaluated on AA-LCR. The benchmark falls in the Reasoning category. This category carries a 17% weight in BenchLM.ai's overall scoring system. AA-LCR is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA-LCR

Year

2026

Tasks

Long-context reasoning tasks

Format

Accuracy

Difficulty

Long-context reasoning

BenchLM stores AA-LCR as a display-only row when OpenRouter or Artificial Analysis publishes the exact long-context reasoning card value.

Artificial Analysis model benchmarks

BenchLM freshness & provenance

Version

AA-LCR 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

FAQ

What does AA-LCR measure?

A display-only Artificial Analysis long-context reasoning evaluation.

Which model scores highest on AA-LCR?

GPT-5.2-Codex by OpenAI currently leads with a score of 75.7% on AA-LCR.

How many models are evaluated on AA-LCR?

155 AI models have been evaluated on AA-LCR on BenchLM.

Compare Top Models on AA-LCR

GPT-5.2-Codex vs GPT-5 (high)GPT-5 (high) vs GPT-5.1 GPT-5.1 vs Kimi K3 Kimi K3 vs GPT-5.5

Last updated: July 23, 2026 · BenchLM version AA-LCR 2026

Choose a model with this week’s evidence

Join 2,000+ readers for ranking moves, pricing changes, and the claims that still need proof.

One email each week. Unsubscribe anytime.