Skip to main content

VITA-Bench

An interactive real-world agent benchmark grounded in practical consumer-service tasks such as delivery, in-store consumption, and online travel workflows.

Benchmark score on VITA-Bench — June 2, 2026

BenchLM mirrors the published score view for VITA-Bench. Qwen3.7 Max leads the public snapshot at 47.9% , followed by Qwen3.6 Plus (44.3%) and Qwen3.5 397B (43.7%). BenchLM does not use these results to rank models overall.

8 modelsAgenticCurrentDisplay onlyUpdated June 2, 2026

The published VITA-Bench snapshot is tightly clustered at the top: Qwen3.7 Max sits at 47.9%, while the third row is only 4.2 points behind. The broader top-10 spread is 32.4 points, so the benchmark still separates strong models even when the leaders cluster.

8 models have been evaluated on VITA-Bench. The benchmark falls in the Agentic category. This category carries a 22% weight in BenchLM.ai's overall scoring system. VITA-Bench is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About VITA-Bench

Year

2025

Tasks

Interactive consumer-service agent tasks

Format

End-to-end interactive agent evaluation

Difficulty

Long-horizon real-world workflows

VITA-Bench is built to test realistic interactive agent behavior rather than toy tool calls. It stresses long-horizon coordination, tool selection, changing user intent, and domain switching across daily-life applications.

BenchLM freshness & provenance

Version

VITA-Bench 2025

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (8 models)

1
47.9%
2
44.3%
3
43.7%
4
35.6%
5
23.3%
6
18.5%
7
17.0%
8
15.5%

FAQ

What does VITA-Bench measure?

An interactive real-world agent benchmark grounded in practical consumer-service tasks such as delivery, in-store consumption, and online travel workflows.

Which model scores highest on VITA-Bench?

Qwen3.7 Max by Alibaba currently leads with a score of 47.9% on VITA-Bench.

How many models are evaluated on VITA-Bench?

8 AI models have been evaluated on VITA-Bench on BenchLM.

Last updated: June 2, 2026 · BenchLM version VITA-Bench 2025

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.