Skip to main content

MMLU-ProX

A multilingual extension of professional-level academic evaluation across many languages.

Top models on MMLU-ProX — June 2, 2026

As of June 2, 2026, Qwen3.7 Max leads the MMLU-ProX leaderboard with 87% , followed by Claude Opus 4.5 (85.7%) and Qwen3.6 Plus (84.7%).

10 modelsMultilingual65% of category scoreCurrentUpdated June 2, 2026

According to BenchLM.ai, Qwen3.7 Max leads the MMLU-ProX benchmark with a score of 87%, followed by Claude Opus 4.5 (85.7%) and Qwen3.6 Plus (84.7%). The top models are clustered within 2.3 points, suggesting this benchmark is nearing saturation for frontier models.

10 models have been evaluated on MMLU-ProX. The benchmark falls in the Multilingual category. This category carries a 7% weight in BenchLM.ai's overall scoring system. Within that category, MMLU-ProX contributes 65% of the category score, so strong performance here directly affects a model's overall ranking.

About MMLU-ProX

Year

2025

Tasks

Multilingual professional QA

Format

Multilingual multiple choice

Difficulty

Professional multilingual

MMLU-ProX expands multilingual evaluation beyond translated arithmetic, making it a better signal for broad cross-lingual reasoning and knowledge.

BenchLM freshness & provenance

Version

MMLU-ProX 2025

Refresh cadence

Static

Staleness state

Current

Question availability

Public benchmark set

Current

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (10 models)

1
87%
2
85.7%
3
84.7%
4
84.7%
5
83.1%
6
82.3%
7
82.2%
8
82.2%
9
81%
10
79.4%

FAQ

What does MMLU-ProX measure?

A multilingual extension of professional-level academic evaluation across many languages.

Which model scores highest on MMLU-ProX?

Qwen3.7 Max by Alibaba currently leads with a score of 87% on MMLU-ProX.

How many models are evaluated on MMLU-ProX?

10 AI models have been evaluated on MMLU-ProX on BenchLM.

Last updated: June 2, 2026 · BenchLM version MMLU-ProX 2025

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.