Benchmark profile

GDPval-AA normalized (GDPval-AA)

A display-only Artificial Analysis normalized score for economically valuable tasks.

Data verified July 23, 2026

Benchmark score on GDPval-AA — July 23, 2026

BenchLM mirrors the published score view for GDPval-AA. Claude Fable 5 leads the public snapshot at 62.4% , followed by GPT-5.6 Sol (61.8%) and Kimi K3 (59.0%). BenchLM does not use these results to rank models overall.

1Closed

Claude Fable 5

Anthropic

claude-fable-5

62.4%

Overall 83.68Context 1M+

2Closed

GPT-5.6 Sol

OpenAI

gpt-5-6-sol

61.8%

Overall 81.96Context 1M

3Closed

Kimi K3

Moonshot AI

kimi-3

59.0%

Overall 80.96Context 1.05M

82 modelsAgenticCurrentDisplay onlyUpdated July 23, 2026

Benchmark score table (82 models)

Score

Claude Fable 5Anthropic · Closed

62.4%

GPT-5.6 SolOpenAI · Closed

61.8%

Kimi K3Moonshot AI · Closed

59.0%

Claude Sonnet 5Anthropic · Closed

55.4%

Claude Opus 4.8Anthropic · Closed

54.7%

GPT-5.6 LunaOpenAI · Closed

54.2%

GPT-5.6 TerraOpenAI · Closed

54.1%

Grok 4.5xAI · Closed

51.7%

GLM-5.2Z.AI · Open weight

50.7%

Claude Opus 4.7 (Adaptive)Anthropic · Closed

49.8%

GPT-5.5OpenAI · Closed

49.5%

Gemini 3.6 FlashGoogle · Closed

46.1%

GPT-5.4OpenAI · Closed

44.7%

MiniMax M3MiniMax · Open weight

44.7%

Muse Spark 1.1Meta · Closed

43.7%

Gemini 3.5 FlashGoogle · Closed

42.4%

DeepSeek V4 Pro (Max)DeepSeek · Open weight

40.4%

DeepSeek V4 Pro (High)DeepSeek · Open weight

39.9%

Qwen3.7 MaxAlibaba · Closed

38.7%

MiMo-V2.5-ProXiaomi · Closed

38.3%

GLM-5.1Z.AI · Open weight

37.8%

InklingThinking Machines Lab · Open weight

36.9%

Hy3 PreviewTencent · Open weight

35.7%

Hy3Tencent · Open weight

35.7%

Kimi K2.6Moonshot AI · Open weight

34.5%

DeepSeek V4 Flash (Max)DeepSeek · Open weight

34.4%

Kimi K2.7 CodeMoonshot AI · Open weight

34.3%

GPT-5.4 miniOpenAI · Closed

33.6%

GLM-4.7Z.AI · Open weight

33.3%

Nemotron 3 UltraNVIDIA · Open weight

33.2%

MiniMax M2.7MiniMax · Open weight

32.9%

DeepSeek V4 Flash (High)DeepSeek · Open weight

32.4%

Muse SparkMeta · Closed

32.2%

Qwen3.6-27BAlibaba · Open weight

32.0%

Gemini 3.5 Flash-LiteGoogle · Closed

32.0%

Qwen3.6 PlusAlibaba · Closed

31.8%

GPT-5.4 nanoOpenAI · Closed

30.0%

Grok 4.3xAI · Closed

29.2%

GPT-5 (high)OpenAI · Closed

28.7%

Qwen3.6-35B-A3BAlibaba · Open weight

27.4%

Step 3.7 FlashStepFun · Open weight

25.9%

Kimi K2.5Moonshot AI · Open weight

25.4%

Kimi K2.5 (Reasoning)Moonshot AI · Closed

25.4%

GPT-5.1OpenAI · Closed

24.4%

Qwen3.5-122B-A10BAlibaba · Open weight

23.9%

Gemini 3.1 ProGoogle · Closed

23.3%

Qwen3.5 397BAlibaba · Open weight

23.1%

Qwen3.5 397B (Reasoning)Alibaba · Open weight

23.1%

Qwen3.7 PlusAlibaba · Closed

21.8%

GPT-5 miniOpenAI · Closed

21.8%

Mistral Medium 3.5 128BMistral · Open weight

21.4%

MiMo-V2-FlashXiaomi · Open weight

16.7%

Gemma 4 31BGoogle · Open weight

15.2%

GPT-OSS 120BOpenAI · Open weight

15.0%

Gemma 4 26B A4BGoogle · Open weight

13.1%

Command A+Cohere · Open weight

10.7%

Mercury 2Inception · Closed

9.9%

Nemotron 3 Super 120B A12BNVIDIA · Open weight

9.7%

Gemini 2.5 ProGoogle · Closed

8.3%

Gemini 3.1 Flash-LiteGoogle · Closed

7.1%

Mistral Large 3Mistral · Closed

6.6%

Mistral Small 4Mistral · Open weight

4.4%

Mistral Small 4 (Reasoning)Mistral · Open weight

4.4%

GPT-OSS 20BOpenAI · Open weight

3.0%

Trinity-Large-PreviewArcee AI · Open weight

2.7%

Trinity-Large-ThinkingArcee AI · Open weight

2.7%

Ling 2.6 FlashInclusionAI · Open weight

2.2%

GPT-4.1 miniOpenAI · Closed

0.1%

Nemotron 3 Nano Omni 30B A3BNVIDIA · Open weight

0.0%

GPT-4.1 nanoOpenAI · Closed

0.0%

DeepSeek V3DeepSeek · Open weight

0.0%

Llama 4 ScoutMeta · Open weight

0.0%

Llama 4 MaverickMeta · Open weight

0.0%

Gemma 3 27BGoogle · Open weight

0.0%

Nemotron 3 Nano 30BNVIDIA · Open weight

0.0%

GPT-4o miniOpenAI · Closed

0.0%

Ministral 3 14B (Reasoning)Mistral · Open weight

0.0%

Ministral 3 14BMistral · Open weight

0.0%

Ministral 3 8B (Reasoning)Mistral · Open weight

0.0%

Ministral 3 8BMistral · Open weight

0.0%

Ministral 3 3B (Reasoning)Mistral · Open weight

0.0%

Ministral 3 3BMistral · Open weight

0.0%

The published GDPval-AA snapshot places Claude Fable 5 first at 62.4%. The third row is 3.4 points behind. The broader top-10 range is 12.6 points, so the table still separates the published systems.

82 models have been evaluated on GDPval-AA. The benchmark falls in the Agentic category. This category carries a 22% weight in BenchLM.ai's overall scoring system. GDPval-AA is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About GDPval-AA

Year

2026

Tasks

Economically valuable tasks

Format

Normalized score

Difficulty

Professional agentic workflows

OpenRouter's Grok 4.3 benchmark card displays GDPval-AA as a normalized percentage. BenchLM stores it separately from the Elo-style GDPval-AA rows used in provider comparison tables.

Artificial Analysis model benchmarks

BenchLM freshness & provenance

Version

GDPval-AA 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

FAQ

What does GDPval-AA measure?

A display-only Artificial Analysis normalized score for economically valuable tasks.

Which model scores highest on GDPval-AA?

Claude Fable 5 by Anthropic currently leads with a score of 62.4% on GDPval-AA.

How many models are evaluated on GDPval-AA?

82 AI models have been evaluated on GDPval-AA on BenchLM.

Compare Top Models on GDPval-AA

Claude Fable 5 vs GPT-5.6 Sol GPT-5.6 Sol vs Kimi K3 Kimi K3 vs Claude Sonnet 5 Claude Sonnet 5 vs Claude Opus 4.8

Last updated: July 23, 2026 · BenchLM version GDPval-AA 2026

Choose a model with this week’s evidence

Join 2,000+ readers for ranking moves, pricing changes, and the claims that still need proof.

One email each week. Unsubscribe anytime.