Benchmark profile

Artificial Analysis Agentic Index (AA Agentic Index)

A display-only Artificial Analysis agentic index.

Data verified July 23, 2026

Benchmark score on AA Agentic Index — July 23, 2026

BenchLM mirrors the published score view for AA Agentic Index. GPT-5.6 Sol leads the public snapshot at 54.0% , followed by Claude Fable 5 (52.8%) and Kimi K3 (50.1%). BenchLM does not use these results to rank models overall.

1Closed

GPT-5.6 Sol

OpenAI

gpt-5-6-sol

54.0%

Overall 81.96Context 1M

2Closed

Claude Fable 5

Anthropic

claude-fable-5

52.8%

Overall 83.68Context 1M+

3Closed

Kimi K3

Moonshot AI

kimi-3

50.1%

Overall 80.96Context 1.05M

79 modelsAgenticCurrentDisplay onlyUpdated July 23, 2026

Benchmark score table (79 models)

Score

GPT-5.6 SolOpenAI · Closed

54.0%

Claude Fable 5Anthropic · Closed

52.8%

Kimi K3Moonshot AI · Closed

50.1%

GPT-5.6 TerraOpenAI · Closed

47.4%

Claude Opus 4.8Anthropic · Closed

47.2%

Claude Sonnet 5Anthropic · Closed

46.7%

Grok 4.5xAI · Closed

45.7%

GPT-5.6 LunaOpenAI · Closed

45.6%

GPT-5.5OpenAI · Closed

44.9%

Claude Opus 4.7 (Adaptive)Anthropic · Closed

44.4%

GLM-5.2Z.AI · Open weight

43.1%

GPT-5.4OpenAI · Closed

41.1%

Gemini 3.6 FlashGoogle · Closed

38.7%

Muse Spark 1.1Meta · Closed

37.5%

Gemini 3.5 FlashGoogle · Closed

37.5%

DeepSeek V4 Pro (Max)DeepSeek · Open weight

36.4%

MiniMax M3MiniMax · Open weight

35.4%

DeepSeek V4 Pro (High)DeepSeek · Open weight

34.4%

InklingThinking Machines Lab · Open weight

32.3%

DeepSeek V4 Flash (Max)DeepSeek · Open weight

31.1%

Hy3 PreviewTencent · Open weight

30.7%

Hy3Tencent · Open weight

30.7%

Qwen3.7 MaxAlibaba · Closed

30.6%

Kimi K2.6Moonshot AI · Open weight

30.3%

GPT-5.4 miniOpenAI · Closed

30.2%

GLM-5.1Z.AI · Open weight

29.9%

Kimi K2.7 CodeMoonshot AI · Open weight

29.6%

MiMo-V2.5-ProXiaomi · Closed

29.1%

Muse SparkMeta · Closed

28.7%

DeepSeek V4 Flash (High)DeepSeek · Open weight

28.2%

Qwen3.6 PlusAlibaba · Closed

27.6%

GPT-5.4 nanoOpenAI · Closed

27.5%

Nemotron 3 UltraNVIDIA · Open weight

27.4%

Qwen3.6-27BAlibaba · Open weight

27.0%

Gemini 3.5 Flash-LiteGoogle · Closed

26.8%

GPT-5 (high)OpenAI · Closed

25.7%

MiniMax M2.7MiniMax · Open weight

25.6%

GLM-4.7Z.AI · Open weight

25.4%

Grok 4.3xAI · Closed

24.1%

Kimi K2.5Moonshot AI · Open weight

21.7%

Kimi K2.5 (Reasoning)Moonshot AI · Closed

21.7%

Step 3.7 FlashStepFun · Open weight

21.5%

Qwen3.6-35B-A3BAlibaba · Open weight

21.4%

Gemini 3.1 ProGoogle · Closed

21.4%

GPT-5.1OpenAI · Closed

21.0%

Qwen3.7 PlusAlibaba · Closed

20.8%

Qwen3.5-122B-A10BAlibaba · Open weight

20.7%

Qwen3.5 397BAlibaba · Open weight

19.9%

Qwen3.5 397B (Reasoning)Alibaba · Open weight

19.9%

GPT-5 miniOpenAI · Closed

19.4%

Mistral Medium 3.5 128BMistral · Open weight

19.0%

Gemma 4 31BGoogle · Open weight

14.4%

GPT-OSS 120BOpenAI · Open weight

13.2%

MiMo-V2-FlashXiaomi · Open weight

12.0%

Gemma 4 26B A4BGoogle · Open weight

11.0%

Mercury 2Inception · Closed

9.6%

Command A+Cohere · Open weight

9.2%

Nemotron 3 Super 120B A12BNVIDIA · Open weight

8.7%

Gemini 2.5 ProGoogle · Closed

7.1%

Gemini 3.1 Flash-LiteGoogle · Closed

6.2%

Mistral Large 3Mistral · Closed

5.5%

Mistral Small 4Mistral · Open weight

4.7%

Mistral Small 4 (Reasoning)Mistral · Open weight

4.7%

GPT-OSS 20BOpenAI · Open weight

3.1%

Ling 2.6 FlashInclusionAI · Open weight

2.3%

Ministral 3 14B (Reasoning)Mistral · Open weight

2.2%

Ministral 3 14BMistral · Open weight

2.2%

Nemotron 3 Nano 30BNVIDIA · Open weight

2.0%

GPT-4.1 miniOpenAI · Closed

1.7%

DeepSeek V3DeepSeek · Open weight

1.6%

Ministral 3 3B (Reasoning)Mistral · Open weight

1.6%

Ministral 3 3BMistral · Open weight

1.6%

Llama 4 MaverickMeta · Open weight

1.3%

Ministral 3 8B (Reasoning)Mistral · Open weight

1.2%

Ministral 3 8BMistral · Open weight

1.2%

GPT-4.1 nanoOpenAI · Closed

1.2%

Llama 4 ScoutMeta · Open weight

1.1%

GPT-4o miniOpenAI · Closed

1.0%

Gemma 3 27BGoogle · Open weight

0.3%

The published AA Agentic Index snapshot places GPT-5.6 Sol first at 54.0%. The third row is 3.9 points behind. The broader top-10 range is 9.6 points, so many of the published results sit in a relatively narrow band.

79 models have been evaluated on AA Agentic Index. The benchmark falls in the Agentic category. This category carries a 22% weight in BenchLM.ai's overall scoring system. AA Agentic Index is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA Agentic Index

Year

2026

Tasks

Cross-benchmark agentic index

Format

Aggregated model score

Difficulty

Display-only external reference

BenchLM mirrors this agentic index for comparison, but does not use it as a weighted agentic benchmark row.

Artificial Analysis model leaderboards

BenchLM freshness & provenance

Version

AA Agentic Index 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

FAQ

What does AA Agentic Index measure?

A display-only Artificial Analysis agentic index.

Which model scores highest on AA Agentic Index?

GPT-5.6 Sol by OpenAI currently leads with a score of 54.0% on AA Agentic Index.

How many models are evaluated on AA Agentic Index?

79 AI models have been evaluated on AA Agentic Index on BenchLM.

Compare Top Models on AA Agentic Index

GPT-5.6 Sol vs Claude Fable 5 Claude Fable 5 vs Kimi K3 Kimi K3 vs GPT-5.6 Terra GPT-5.6 Terra vs Claude Opus 4.8

Last updated: July 23, 2026 · BenchLM version AA Agentic Index 2026

Choose a model with this week’s evidence

Join 2,000+ readers for ranking moves, pricing changes, and the claims that still need proof.

One email each week. Unsubscribe anytime.