BenchLM recommendation

Best Value Agentic AI Model in 2026 — Cost-Adjusted Rankings

Data verified July 23, 2026

As of July 23, 2026, the top model in best value agentic ai model on the BenchLM leaderboard is Ministral 3 3B with a score of 208.1.

Last verified: July 23, 2026

Agentic workloads are token-intensive — agents loop, retry, and chain multiple calls. That makes cost-per-token a critical factor alongside raw capability. This ranking divides each model's weighted agentic score (Terminal-Bench 2.0, BrowseComp, OSWorld-Verified) by its output token price. The result shows which models give you the most agent capability per dollar. If you're building production AI agents with budget constraints, this is where you start.

Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.

Bottom line: Agentic tasks are token-intensive — value matters more here than almost anywhere. Gemini 3.1 Flash-Lite leads, with GPT-4o mini offering competitive agentic value.

Ministral 3 3B leads this ranking with a score of 208.1, followed by DeepSeek V4 Flash (Max) (176.68) and DeepSeek V4 Flash (High) (163.32). There is a significant gap between the leading models and the rest of the field.

The best open-weight option is Ministral 3 3B (ranked #1 with a score of 208.1). Open-weight models are highly competitive in this category — self-hosting is a viable alternative to proprietary APIs.

This ranking is based on provisional weighted averages across the scoring benchmarks in agentic tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

1Open

Ministral 3 3B

Mistral · 128K

208.1Score/$

Score: 20.8 · $0.1/1M

2Open

DeepSeek V4 Flash (Max)

DeepSeek · 1M

176.68Score/$

Score: 49.5 · $0.28/1M

3Open

DeepSeek V4 Flash (High)

DeepSeek · 1M

163.32Score/$

Score: 45.7 · $0.28/1M

What changed

Gemini 3.1 Flash-Lite leads agentic value — most agent capability per dollar.

GPT-4o mini strong agentic value in OpenAI's lineup.

Gemini 2.5 Flash good agentic performance at Flash-tier pricing.

How to choose

Cheapest agentic model?

Gemini 3.1 Flash-Lite — most agent loops per dollar

Agentic value in OpenAI ecosystem?

GPT-4o mini — best OpenAI value for agents

Best raw agentic performance?

See the standard agentic leaderboard

Full Rankings (71 models)

Ministral 3 3B

Mistral·Open Weight·128K

208.1

Score/$

Score: 20.8 · $0.1/1M

vs #2

DeepSeek V4 Flash (Max)

DeepSeek·Open Weight·1M

176.68

Score/$

Score: 49.5 · $0.28/1M

vs #3

DeepSeek V4 Flash (High)

DeepSeek·Open Weight·1M

163.32

Score/$

Score: 45.7 · $0.28/1M

vs #4

Ministral 3 14B

Mistral·Open Weight·128K

158.55

Score/$

Score: 31.7 · $0.2/1M

vs #5

Ministral 3 8B

Mistral·Open Weight·128K

150.07

Score/$

Score: 22.5 · $0.15/1M

vs #6

DeepSeek V4 Flash

DeepSeek·Open Weight·1M

130.04

Score/$

Score: 36.4 · $0.28/1M

vs #7

GPT-4.1 nano

OpenAI·Proprietary·1M

100.57

Score/$

Score: 40.2 · $0.4/1M

vs #8

Mistral Small 4

Mistral·Open Weight·256K

70.45

Score/$

Score: 42.3 · $0.6/1M

vs #9

DeepSeek V4 Pro (Max)

DeepSeek·Open Weight·1M

61.94

Score/$

Score: 53.9 · $0.87/1M

vs #10

DeepSeek V4 Pro (High)

DeepSeek·Open Weight·1M

60.64

Score/$

Score: 52.8 · $0.87/1M

vs #11

GPT-4o mini

OpenAI·Proprietary·128K

56.08

Score/$

Score: 33.7 · $0.6/1M

vs #12

DeepSeek V4 Pro

DeepSeek·Open Weight·1M

50.32

Score/$

Score: 43.8 · $0.87/1M

vs #13

Step 3.7 Flash

StepFun·Open Weight·256K

39.56

Score/$

Score: 45.5 · $1.15/1M

vs #14

MiniMax M2.5

MiniMax·Proprietary·128K

36.86

Score/$

Score: 44.2 · $1.2/1M

vs #15

DeepSeek V3

DeepSeek·Open Weight·128K

35.02

Score/$

Score: 38.5 · $1.1/1M

vs #16

MiniMax M3

MiniMax·Open Weight·1M

34.03

Score/$

Score: 40.8 · $1.2/1M

vs #17

GPT-5.4 nano

OpenAI·Proprietary·400K

33.27

Score/$

Score: 41.6 · $1.25/1M

vs #18

MiniMax M2.7

MiniMax·Open Weight·200K

28.97

Score/$

Score: 34.8 · $1.2/1M

vs #19

Mistral Large 3

Mistral·Proprietary·128K

28.74

Score/$

Score: 43.1 · $1.5/1M

vs #20

GPT-4.1 mini

OpenAI·Proprietary·1M

25.59

Score/$

Score: 41 · $1.6/1M

vs #21

GPT-5 mini

OpenAI·Proprietary·128K

21.24

Score/$

Score: 42.5 · $2/1M

vs #22

Composer 2.5

Cursor·Proprietary·200K

20.96

Score/$

Score: 52.4 · $2.5/1M

vs #23

Composer 2

Cursor·Proprietary·200K

20.48

Score/$

Score: 51.2 · $2.5/1M

vs #24

Qwen3.5 Plus

Alibaba·Proprietary·1M

20.05

Score/$

Score: 48.1 · $2.4/1M

vs #25

Holo3-122B-A10B

H Company·Proprietary·64K

17.7

Score/$

Score: 53.1 · $3/1M

vs #26

GLM-5

Z.AI·Open Weight·200K

17.13

Score/$

Score: 54.8 · $3.2/1M

vs #27

Kimi K2.5 (Reasoning)

Moonshot AI·Proprietary·128K

16.36

Score/$

Score: 49.1 · $3/1M

vs #28

Grok Build 0.1

xAI·Proprietary·256K

14.29

Score/$

Score: 28.6 · $2/1M

vs #29

Kimi K2.5

Moonshot AI·Open Weight·256K

13.21

Score/$

Score: 39.6 · $3/1M

vs #30

Qwen3.5 397B

Alibaba·Open Weight·128K

12.8

Score/$

Score: 46.1 · $3.6/1M

vs #31

GLM-5.2

Z.AI·Open Weight·1M

12.41

Score/$

Score: 54.6 · $4.4/1M

vs #32

GLM-5.1

Z.AI·Open Weight·203K

11.05

Score/$

Score: 48.6 · $4.4/1M

vs #33

Claude Haiku 4.5

Anthropic·Proprietary·200K

10.09

Score/$

Score: 50.4 · $5/1M

vs #34

Grok 4.5

xAI·Proprietary·500K

Score/$

Score: 60 · $6/1M

vs #35

Inkling

Thinking Machines Lab·Open Weight·1M

9.8

Score/$

Score: 45.9 · $4.68/1M

vs #36

GPT-5.6 Luna

OpenAI·Proprietary·1M

9.75

Score/$

Score: 58.5 · $6/1M

vs #37

Kimi K2.6

Moonshot AI·Open Weight·256K

9.51

Score/$

Score: 38 · $4/1M

vs #38

Kimi K2.7 Code

Moonshot AI·Open Weight·256K

9.51

Score/$

Score: 38 · $4/1M

vs #39

Gemini 3 Flash

Google·Proprietary·1M

8.39

Score/$

Score: 25.2 · $3/1M

vs #40

GPT-5.4 mini

OpenAI·Proprietary·400K

8.34

Score/$

Score: 37.5 · $4.5/1M

vs #41

Grok 4.20

xAI·Proprietary·2M

8.27

Score/$

Score: 49.6 · $6/1M

vs #42

Claude Sonnet 5

Anthropic·Proprietary·1M

5.87

Score/$

Score: 58.7 · $10/1M

vs #43

GPT-5 (high)

OpenAI·Proprietary·128K

Score/$

Score: 50 · $10/1M

vs #44

Gemini 3 Pro

Google·Proprietary·2M

4.97

Score/$

Score: 59.7 · $12/1M

vs #45

Gemini 3.5 Flash

Google·Proprietary·1M

4.93

Score/$

Score: 44.4 · $9/1M

vs #46

GPT-5.6 Terra

OpenAI·Proprietary·1M

4.87

Score/$

Score: 73.1 · $15/1M

vs #47

GPT-5.1

OpenAI·Proprietary·200K

4.86

Score/$

Score: 48.6 · $10/1M

vs #48

Gemini 2.5 Pro

Google·Proprietary·1M

4.81

Score/$

Score: 48.1 · $10/1M

vs #49

Kimi K3

Moonshot AI·Pending·1.05M

4.44

Score/$

Score: 66.6 · $15/1M

vs #50

Command A+

Cohere·Open Weight·128K

4.38

Score/$

Score: 43.8 · $10/1M

vs #51

GPT-5.3 Codex

OpenAI·Proprietary·400K

4.37

Score/$

Score: 61.2 · $14/1M

vs #52

GPT-5.4

OpenAI·Proprietary·1.05M

3.8

Score/$

Score: 57 · $15/1M

vs #53

GPT-5.2-Codex

OpenAI·Proprietary·400K

3.8

Score/$

Score: 53.1 · $14/1M

vs #54

Claude Sonnet 4.6

Anthropic·Proprietary·200K

3.41

Score/$

Score: 51.2 · $15/1M

vs #55

Claude Sonnet 4.5

Anthropic·Proprietary·200K

3.12

Score/$

Score: 46.9 · $15/1M

vs #56

Gemini 3.1 Pro

Google·Proprietary·1M

3.11

Score/$

Score: 37.3 · $12/1M

vs #57

GPT-5.2

OpenAI·Proprietary·400K

2.97

Score/$

Score: 41.6 · $14/1M

vs #58

Claude 4 Sonnet

Anthropic·Proprietary·200K

2.85

Score/$

Score: 42.8 · $15/1M

vs #59

Claude Opus 4.7 (Adaptive)

Anthropic·Proprietary·1M

2.64

Score/$

Score: 65.9 · $25/1M

vs #60

Claude Opus 4.8

Anthropic·Proprietary·1M

2.56

Score/$

Score: 64 · $25/1M

vs #61

Grok 4.3

xAI·Proprietary·1M

2.52

Score/$

Score: 6.3 · $2.5/1M

vs #62

GPT-5.6 Sol

OpenAI·Proprietary·1M

2.51

Score/$

Score: 75.2 · $30/1M

vs #63

Claude Opus 4.6

Anthropic·Proprietary·1M

2.21

Score/$

Score: 55.2 · $25/1M

vs #64

GPT-5.5

OpenAI·Proprietary·1M

2.19

Score/$

Score: 65.8 · $30/1M

vs #65

Claude Opus 4.7

Anthropic·Proprietary·1M

2.19

Score/$

Score: 54.8 · $25/1M

vs #66

Claude Opus 4.5

Anthropic·Proprietary·200K

1.44

Score/$

Score: 35.9 · $25/1M

vs #67

Claude Mythos 5

Anthropic·Proprietary·1M+

1.3

Score/$

Score: 65 · $50/1M

vs #68

Claude Fable 5

Anthropic·Proprietary·1M+

1.28

Score/$

Score: 63.9 · $50/1M

vs #69

Claude 4.1 Opus

Anthropic·Proprietary·200K

0.61

Score/$

Score: 45.7 · $75/1M

vs #70

GPT-5.5 Pro

OpenAI·Proprietary·1M

0.34

Score/$

Score: 60.5 · $180/1M

vs #71

GPT-5.4 Pro

OpenAI·Proprietary·1.05M

0.32

Score/$

Score: 58.2 · $180/1M

Key Takeaways

The best value model is Ministral 3 3B by Mistral with a provisional Score/$ ratio of 208.1 (score: 20.8, output: $0.1/1M tokens).

The best open-weight model is Ministral 3 3B at position #1.

71 models are included in this ranking.

Score in Context

What these scores mean

Value scores divide the weighted agentic score by output token price (per 1M tokens). Higher means more capability per dollar. Models with no listed price are excluded.

Known limitations

Value rankings favor cheap models even if absolute performance is modest. A model scoring half as well at one-tenth the price wins on value — but may not meet your quality bar. Always check raw scores alongside value rankings.

Explore More

Last updated: July 23, 2026

Choose a model with this week’s evidence

Join 2,000+ readers for ranking moves, pricing changes, and the claims that still need proof.

One email each week. Unsubscribe anytime.