Benchmark profile

Finance Agent v2

Vals AI benchmark for realistic financial analyst agent tasks across qualitative analysis, quantitative analysis, market work, comparables, precedents, earnings, disclosure, and modeling.

Data verified July 17, 2026

How BenchLM shows Finance Agent v2

BenchLM mirrors the public Vals AI Finance Agent v2 leaderboard captured from https://www.vals.ai/benchmarks/fabv2 and updated by Vals on July 17, 2026. The snapshot preserves overall scores, uncertainty, latency, cost-per-test metadata, and task-level scores where Vals publishes them.

Finance Agent v2 is display only on BenchLM. Vals proprietary or Vals-hosted aggregate views are useful context, but BenchLM does not use them as weighted ranking inputs or as a replacement for benchmark-native source records.

37 Vals rows11 task viewsprivate datasetTasks: Overall, All-Pass, General Qualitative Analysis, General Quantitative Analysis, Market AnalysisDisplay only

Finance Agent v2 on Vals AI Vals methodology Vals home

Finance Agent v2 score on Finance Agent v2 — July 17, 2026

BenchLM mirrors the published finance agent v2 score view for Finance Agent v2. Gemini 3.5 Flash leads the public snapshot at 57.9% , followed by Muse Spark 1 1 (57.2%) and Claude Fable 5 (56.3%). BenchLM does not use these results to rank models overall.

Gemini 3.5 Flash

Google

google/gemini-3.5-flash

57.9%

Overall —

Muse Spark 1 1

Finance Agent v2 score table (37 models)

Score

Gemini 3.5 FlashGoogle

57.9%

Muse Spark 1 1Meta

57.2%

Claude Fable 5Anthropic

56.3%

GPT-5.6 LunaOpenAI

55.0%

Kimi K3Moonshot AI

54.4%

Claude Opus 4.8Anthropic

53.9%

Claude Sonnet 5Anthropic

53.9%

GPT-5.6 SolOpenAI

53.8%

GPT-5.6 TerraOpenAI

52.4%

GPT-5.5OpenAI

51.8%

Claude Opus 4.7Anthropic

51.5%

Claude Sonnet 4.6Anthropic

51.0%

GLM 5.2Zhipu AI

49.7%

Grok 4.5SpaceXAI

48.3%

MiniMax M3MiniMax

48.3%

Qwen3.7 MaxAlibaba

47.8%

InklingThinkingmachines

46.6%

GPT-5.4 MiniOpenAI

45.4%

Kimi K2.6Moonshot AI

44.9%

GLM 5.1Zhipu AI

44.8%

DeepSeek V4 ProDeepSeek

44.1%

Gemini 3.1 Pro PreviewGoogle

43.0%

Gemini 3 Flash PreviewGoogle

42.6%

Mimo V2.5 ProXiaomi

41.5%

Qwen3.6 PlusAlibaba

40.8%

Qwen3.7 PlusAlibaba

38.2%

GPT-5.4 NanoOpenAI

38.2%

Grok 4.3SpaceXAI

37.7%

Nemotron 3 Ultra 550b A55bNvidia

37.7%

Mimo V2.5Xiaomi

36.7%

Mistral Medium 3.5Mistral AI

32.1%

Claude Haiku 4.5 20251001 ThinkingAnthropic

31.0%

Gemini 3.1 Flash Lite PreviewGoogle

30.0%

Grok 4.20 0309 ReasoningSpaceXAI

28.5%

MiniMax M2.7MiniMax

27.9%

Laguna M.1Poolside

25.0%

Laguna Xs.2Poolside

15.6%

The published Finance Agent v2 snapshot places Gemini 3.5 Flash first at 57.9%. The third row is 1.5 points behind. The broader top-10 range is 6.1 points, so many of the published results sit in a relatively narrow band.

37 models have been evaluated on Finance Agent v2. The benchmark falls in the Agentic category. This category carries a 22% weight in BenchLM.ai's overall scoring system. Finance Agent v2 is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About Finance Agent v2

Year

2026

Tasks

Financial analyst task categories

Format

Mean score across repeated runs

Difficulty

Professional expert-task agent workflow

Vals reports Finance Agent v2 as a multi-category benchmark with severity-weighted partial credit and repeated runs per model. BenchLM mirrors the public Vals leaderboard as a display-only expert-task benchmark.

Finance Agent v2 Public benchmark source

BenchLM freshness & provenance

Version

Finance Agent v2 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

FAQ

What does Finance Agent v2 measure?

Vals AI benchmark for realistic financial analyst agent tasks across qualitative analysis, quantitative analysis, market work, comparables, precedents, earnings, disclosure, and modeling.

Which model leads the published Finance Agent v2 snapshot?

Gemini 3.5 Flash currently leads the published Finance Agent v2 snapshot with 57.9% finance agent v2 score. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on Finance Agent v2?

37 AI models are included in BenchLM's mirrored Finance Agent v2 snapshot, based on the public leaderboard captured on July 17, 2026.

Last updated: July 17, 2026 · mirrored from the public benchmark leaderboard

Choose a model with this week’s evidence

Join 2,000+ readers for ranking moves, pricing changes, and the claims that still need proof.

One email each week. Unsubscribe anytime.