Skip to main content

MRCRv2

A long-context benchmark for memory, retrieval, and multi-round coherence over large contexts.

Top models on MRCRv2 — June 2, 2026

As of June 2, 2026, Qwen3.7 Max leads the MRCRv2 leaderboard with 90.4% , followed by Gemini 3.5 Flash (77.3%).

2 modelsReasoning25% of category scoreCurrentUpdated June 2, 2026

About MRCRv2

Year

2025

Tasks

Long-context retrieval

Format

Multi-round long-context evaluation

Difficulty

Hard long-context

MRCRv2 is especially useful for models that compete on long context, since it checks whether they can retrieve the right information across long, multi-round interactions.

BenchLM freshness & provenance

Version

MRCRv2 2025

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

Current

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (2 models)

1
90.4%
2
77.3%

FAQ

What does MRCRv2 measure?

A long-context benchmark for memory, retrieval, and multi-round coherence over large contexts.

Which model scores highest on MRCRv2?

Qwen3.7 Max by Alibaba currently leads with a score of 90.4% on MRCRv2.

How many models are evaluated on MRCRv2?

2 AI models have been evaluated on MRCRv2 on BenchLM.

Compare Top Models on MRCRv2

Last updated: June 2, 2026 · BenchLM version MRCRv2 2025

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.