Benchmark profile

MMMLU

A multilingual MMLU-style benchmark reported in provider evaluation tables.

Data verified July 23, 2026

Benchmark score on MMMLU — July 23, 2026

BenchLM mirrors the published score view for MMMLU. Interfaze Beta leads the public snapshot at 90.9% , followed by Qwen3.7 Max (90.3%) and DeepSeek V4 Pro Base (90.3%). BenchLM does not use these results to rank models overall.

1Closed

Interfaze Beta

Interfaze

interfaze-beta

90.9%

Overall —Context 1M

2Closed

Qwen3.7 Max

Alibaba

qwen3-7-max

90.3%

Overall 72.84Context 1M

3Open

DeepSeek V4 Pro Base

DeepSeek

deepseek-v4-pro-base

90.3%

Overall —Context 1M

6 modelsKnowledgeCurrentDisplay onlyUpdated July 23, 2026

Benchmark score table (6 models)

Score

Interfaze BetaInterfaze · Closed

90.9%

Qwen3.7 MaxAlibaba · Closed

90.3%

DeepSeek V4 Pro BaseDeepSeek · Open weight

90.3%

Qwen3.7 PlusAlibaba · Closed

89.0%

DeepSeek V4 Flash BaseDeepSeek · Open weight

88.8%

Gemma 4 12BGoogle · Open weight

83.4%

The published MMMLU snapshot places Interfaze Beta first at 90.9%. The third row is 0.6 points behind. The broader top-10 range is 7.5 points, so many of the published results sit in a relatively narrow band.

6 models have been evaluated on MMMLU. The benchmark falls in the Knowledge category. This category carries a 12% weight in BenchLM.ai's overall scoring system. MMMLU is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About MMMLU

Year

2026

Tasks

Multilingual academic QA

Format

Exact match

Difficulty

Broad multilingual knowledge

BenchLM stores MMMLU as a display-only provider-table row when exact public values are published.

MMMLU

BenchLM freshness & provenance

Version

MMMLU 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

FAQ

What does MMMLU measure?

A multilingual MMLU-style benchmark reported in provider evaluation tables.

Which model scores highest on MMMLU?

Interfaze Beta by Interfaze currently leads with a score of 90.9% on MMMLU.

How many models are evaluated on MMMLU?

6 AI models have been evaluated on MMMLU on BenchLM.

Compare Top Models on MMMLU

Interfaze Beta vs Qwen3.7 Max Qwen3.7 Max vs DeepSeek V4 Pro Base DeepSeek V4 Pro Base vs Qwen3.7 Plus Qwen3.7 Plus vs DeepSeek V4 Flash Base

Last updated: July 23, 2026 · BenchLM version MMMLU 2026

Choose a model with this week’s evidence

Join 2,000+ readers for ranking moves, pricing changes, and the claims that still need proof.

One email each week. Unsubscribe anytime.