Skip to main content

Massive Multi-discipline Multimodal Understanding Pro (MMMU-Pro)

A harder multimodal benchmark for frontier models that combines text with images, diagrams, charts, and academic visual reasoning tasks.

Top models on MMMU-Pro — June 2, 2026

As of June 2, 2026, GPT-5.4 Pro leads the MMMU-Pro leaderboard with 94% , followed by Claude Mythos Preview (92.7%) and Gemini 3.1 Pro (83.9%).

28 modelsMultimodal & Grounded45% of category scoreRefreshingUpdated June 2, 2026

According to BenchLM.ai, GPT-5.4 Pro leads the MMMU-Pro benchmark with a score of 94%, followed by Claude Mythos Preview (92.7%) and Gemini 3.1 Pro (83.9%). The scores show moderate spread, with meaningful differences between the top tier and mid-tier models.

28 models have been evaluated on MMMU-Pro. The benchmark falls in the Multimodal & Grounded category. This category carries a 12% weight in BenchLM.ai's overall scoring system. Within that category, MMMU-Pro contributes 45% of the category score, so strong performance here directly affects a model's overall ranking.

About MMMU-Pro

Year

2024

Tasks

Multimodal academic reasoning

Format

Image + text question answering

Difficulty

Frontier multimodal

MMMU-Pro extends the original MMMU setup with more difficult multimodal questions and stronger separation at the top end of the model market.

BenchLM freshness & provenance

Version

MMMU-Pro 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

Refreshing

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (28 models)

1
94%
2
92.7%
3
83.9%
4
83.6%
5
81.2%
6
81.2%
7
81%
8
80.4%
9
79.5%
10
79.4%
11
79%
12
78.8%
13
78.5%
14
78.5%
15
78.1%
16
78.1%
17
77.9%
18
77.3%
19
76.9%
20
76.6%
21
75.8%
22
75.3%
23
75.2%
24
73.8%
25
71.1%
26
70.6%
27
66.1%
28
63%

FAQ

What does MMMU-Pro measure?

A harder multimodal benchmark for frontier models that combines text with images, diagrams, charts, and academic visual reasoning tasks.

Which model scores highest on MMMU-Pro?

GPT-5.4 Pro by OpenAI currently leads with a score of 94% on MMMU-Pro.

How many models are evaluated on MMMU-Pro?

28 AI models have been evaluated on MMMU-Pro on BenchLM.

Last updated: June 2, 2026 · BenchLM version MMMU-Pro 2024

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.