Benchmark profile

Artificial Analysis Omniscience Hallucination Rate (AA-Omniscience Hallucination Rate)

A display-only Artificial Analysis factuality metric for the rate of incorrect answers among non-correct responses.

Data verified July 23, 2026

Benchmark score on AA-Omniscience Hallucination Rate — July 23, 2026

BenchLM mirrors the published score view for AA-Omniscience Hallucination Rate. Command A+ leads the public snapshot at 14.1% , followed by MiniMax M3 (16.1%) and Qwen3.7 Max (22.9%). BenchLM does not use these results to rank models overall.

1Open

Command A+

Cohere

command-a-plus

14.1%

Overall 47.51Context 128K

2Open

MiniMax M3

MiniMax

minimax-m3

16.1%

Overall 69.75Context 1M

3Closed

Qwen3.7 Max

Alibaba

qwen3-7-max

22.9%

Overall 72.84Context 1M

153 modelsKnowledgeCurrentDisplay onlyUpdated July 23, 2026

Benchmark score table (153 models)

Score

Command A+Cohere · Open weight

14.1%

MiniMax M3MiniMax · Open weight

16.1%

Qwen3.7 MaxAlibaba · Closed

22.9%

MiMo-V2.5-ProXiaomi · Closed

24.5%

Grok 4.3xAI · Closed

25.0%

Qwen3.7 PlusAlibaba · Closed

25.5%

GLM-5.2Z.AI · Open weight

28.1%

Nemotron 3 UltraNVIDIA · Open weight

28.5%

GLM-5.1Z.AI · Open weight

29.4%

MiMo-V2-ProXiaomi · Closed

29.9%

Gemma 4 E4BGoogle · Open weight

31.3%

Qwen3.6 PlusAlibaba · Closed

32.0%

Gemma 4 E2BGoogle · Open weight

32.9%

Gemini 3.5 Flash-LiteGoogle · Closed

33.5%

GLM-5Z.AI · Open weight

34.0%

MiniMax M2.7MiniMax · Open weight

34.4%

Claude Opus 4.8Anthropic · Closed

35.9%

Claude Opus 4.7 (Adaptive)Anthropic · Closed

36.2%

Claude Sonnet 5Anthropic · Closed

37.3%

GPT-4oOpenAI · Closed

37.9%

Muse Spark 1.1Meta · Closed

38.1%

Kimi K2.6Moonshot AI · Open weight

39.3%

Claude 4 SonnetAnthropic · Closed

40.8%

Qwen 3.6 Max (preview)Alibaba · Closed

44.2%

MiMo-V2-OmniXiaomi · Closed

44.4%

LFM2.5-8B-A1BLiquidAI · Open weight

47.0%

Qwen3.6-27BAlibaba · Open weight

48.3%

Qwen3.6-35B-A3BAlibaba · Open weight

49.7%

Gemini 3.1 ProGoogle · Closed

49.9%

Kimi K3Moonshot AI · Closed

50.9%

Llama 3.1 405BMeta · Open weight

51.0%

GPT-5.1OpenAI · Closed

51.3%

Claude Opus 4.7Anthropic · Closed

51.9%

Gemini 3.6 FlashGoogle · Closed

53.5%

Grok 4.5xAI · Closed

53.5%

GPT-5 miniOpenAI · Closed

54.1%

Claude Fable 5Anthropic · Closed

54.9%

GPT-5 nanoOpenAI · Closed

56.3%

Claude Opus 4.5 ThinkingAnthropic · Closed

59.8%

Gemini 3.5 FlashGoogle · Closed

60.7%

Mistral Medium 3Mistral · Closed

60.9%

Claude Opus 4.6 (Adaptive)Anthropic · Closed

61.3%

GLM-5-TurboZ.AI · Closed

62.2%

InklingThinking Machines Lab · Open weight

63.1%

Grok 4xAI · Closed

64.2%

Kimi K2.5Moonshot AI · Open weight

64.6%

Kimi K2.5 (Reasoning)Moonshot AI · Closed

64.6%

Claude Sonnet 4.6Anthropic · Closed

65.9%

Grok 4 Fast (Reasoning)xAI · Closed

66.0%

GLM-4.6Z.AI · Open weight

66.1%

Mistral Small 4Mistral · Open weight

66.8%

Mistral Small 4 (Reasoning)Mistral · Open weight

66.8%

Mistral Large 2Mistral · Closed

67.8%

GLM-5V-TurboZ.AI · Closed

67.9%

o1OpenAI · Closed

69.3%

LFM2-24B-A2BLiquidAI · Closed

70.0%

Grok 4.1 Fast (Reasoning)xAI · Closed

72.4%

GPT-5.2-CodexOpenAI · Closed

72.8%

Hy3 PreviewTencent · Open weight

73.0%

Hy3Tencent · Open weight

73.0%

Muse SparkMeta · Closed

73.2%

GPT-5.4 nanoOpenAI · Closed

73.6%

Kimi K2Moonshot AI · Closed

74.2%

GPT-5.1-Codex-MaxOpenAI · Closed

74.4%

GPT-5.1-CodexOpenAI · Closed

74.4%

MiMo-V2-FlashXiaomi · Open weight

75.1%

Claude Opus 4.5Anthropic · Closed

75.4%

Claude Opus 4.6Anthropic · Closed

76.0%

Ministral 3 3B (Reasoning)Mistral · Open weight

77.5%

Ministral 3 3BMistral · Open weight

77.5%

Granite-4.0-350MIBM · Open weight

77.8%

Nova ProAmazon · Closed

77.9%

Claude 3 HaikuAnthropic · Closed

78.2%

Llama 4 ScoutMeta · Open weight

78.3%

Grok Code Fast 1xAI · Closed

78.5%

GPT-4.1OpenAI · Closed

79.6%

GPT-5.2OpenAI · Closed

79.7%

Qwen3.5-27BAlibaba · Open weight

79.7%

GPT-5 (medium)OpenAI · Closed

80.1%

DeepSeek V3.1 (Reasoning)DeepSeek · Open weight

80.3%

Kimi K2.7 CodeMoonshot AI · Open weight

80.3%

GPT-4.1 nanoOpenAI · Closed

80.4%

Phi-4Microsoft · Open weight

80.5%

Gemma 4 12BGoogle · Open weight

80.8%

Gemma 4 26B A4BGoogle · Open weight

80.9%

Exaone 4.0 32BLG AI Research · Open weight

81.0%

Gemini 3.1 Flash-LiteGoogle · Closed

81.6%

Gemma 4 31BGoogle · Open weight

81.6%

Nemotron Ultra 253BNVIDIA · Open weight

81.7%

Grok 4.1 FastxAI · Closed

81.8%

Mistral Medium 3.5 128BMistral · Open weight

82.0%

GPT-4.1 miniOpenAI · Closed

82.0%

GPT-5 (high)OpenAI · Closed

82.1%

Nemotron 3 Nano 30BNVIDIA · Open weight

82.9%

Nemotron 3 Nano Omni 30B A3BNVIDIA · Open weight

83.1%

Granite-4.0-H-1BIBM · Open weight

83.4%

DeepSeek V3.1DeepSeek · Open weight

83.5%

Mistral Large 3Mistral · Closed

83.7%

Qwen3.5-35B-A3BAlibaba · Open weight

84.0%

100

DeepSeek-R1DeepSeek · Open weight

84.0%

101

Step 3.7 FlashStepFun · Open weight

84.4%

102

LFM2.5-1.2B-InstructLiquidAI · Closed

84.8%

103

GPT-5.6 TerraOpenAI · Closed

85.2%

104

GPT-5.5OpenAI · Closed

85.5%

105

Qwen3.5-122B-A10BAlibaba · Open weight

85.5%

106

Trinity-Large-PreviewArcee AI · Open weight

86.6%

107

Trinity-Large-ThinkingArcee AI · Open weight

86.6%

108

MiniMax M1 80kMiniMax · Closed

86.8%

109

GPT-5.3 CodexOpenAI · Closed

86.9%

110

GPT-5.3-Codex-SparkOpenAI · Closed

86.9%

111

Nemotron 3 Super 120B A12BNVIDIA · Open weight

87.0%

112

o3OpenAI · Closed

87.1%

113

Llama 4 MaverickMeta · Open weight

87.3%

114

Gemini 2.5 ProGoogle · Closed

87.4%

115

GPT-5.4OpenAI · Closed

88.6%

116

DeepSeek V4 Pro (High)DeepSeek · Open weight

88.6%

117

GPT-5.6 SolOpenAI · Closed

88.8%

118

Ministral 3 8B (Reasoning)Mistral · Open weight

89.0%

119

Ministral 3 8BMistral · Open weight

89.0%

120

Qwen3.5 397BAlibaba · Open weight

89.1%

121

Qwen3.5 397B (Reasoning)Alibaba · Open weight

89.1%

122

K-ExaoneLG AI Research · Closed

89.1%

123

MiniMax M2.5MiniMax · Closed

89.3%

124

DeepSeek V3DeepSeek · Open weight

89.4%

125

Qwen3 MaxAlibaba · Closed

89.4%

126

Gemma 3 27BGoogle · Open weight

89.5%

127

GLM-4.7-FlashZ.AI · Open weight

89.5%

128

DeepSeek V4 Flash (High)DeepSeek · Open weight

89.7%

129

GPT-5.4 miniOpenAI · Closed

89.8%

130

GPT-5.6 LunaOpenAI · Closed

90.1%

131

Gemini 3 FlashGoogle · Closed

90.2%

132

Ministral 3 14B (Reasoning)Mistral · Open weight

90.2%

133

Ministral 3 14BMistral · Open weight

90.2%

134

GLM-4.7Z.AI · Open weight

90.3%

135

Gemini 3 ProGoogle · Closed

90.9%

136

GPT-OSS 120BOpenAI · Open weight

91.2%

137

Exaone 4.0 1.2BLG AI Research · Open weight

91.5%

138

Solar Pro 2Upstage · Closed

91.5%

139

Mercury 2Inception · Closed

91.5%

140

Step 3.5 FlashStepFun · Open weight

91.6%

141

GLM-4.5-AirZ.AI · Closed

92.3%

142

Gemini 2.5 FlashGoogle · Closed

93.3%

143

DeepSeek V3.2DeepSeek · Open weight

93.5%

144

Sarvam 105BSarvam · Open weight

93.5%

145

Granite-4.0-1BIBM · Open weight

93.5%

146

DeepSeek V4 Pro (Max)DeepSeek · Open weight

94.0%

147

LFM2.5-VL-1.6B-ExtractLiquidAI · Open weight

94.0%

148

GPT-OSS 20BOpenAI · Open weight

94.1%

149

Granite-4.0-H-350MIBM · Open weight

94.4%

150

DeepSeek V4 Flash (Max)DeepSeek · Open weight

95.8%

151

Ling 2.6 FlashInclusionAI · Open weight

95.8%

152

LFM2.5-1.2B-ThinkingLiquidAI · Closed

96.9%

153

Sarvam 30BSarvam · Open weight

97.0%

The published AA-Omniscience Hallucination Rate snapshot places Command A+ first at 14.1%. The third row is 8.8 points higher. The broader top-10 range is 15.8 points, so the table still separates the published systems.

153 models have been evaluated on AA-Omniscience Hallucination Rate. The benchmark falls in the Knowledge category. This category carries a 12% weight in BenchLM.ai's overall scoring system. AA-Omniscience Hallucination Rate is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA-Omniscience Hallucination Rate

Year

2026

Tasks

Knowledge questions

Format

Hallucination rate

Difficulty

Factuality

BenchLM marks this row lower-is-better because a lower hallucination rate is preferable, even though the OpenRouter card displays the raw percentage.

Artificial Analysis model benchmarks

BenchLM freshness & provenance

Version

AA-Omniscience Hallucination Rate 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

FAQ

What does AA-Omniscience Hallucination Rate measure?

A display-only Artificial Analysis factuality metric for the rate of incorrect answers among non-correct responses.

Which model scores highest on AA-Omniscience Hallucination Rate?

Command A+ by Cohere currently leads with a score of 14.1% on AA-Omniscience Hallucination Rate.

How many models are evaluated on AA-Omniscience Hallucination Rate?

153 AI models have been evaluated on AA-Omniscience Hallucination Rate on BenchLM.

Compare Top Models on AA-Omniscience Hallucination Rate

Command A+ vs MiniMax M3 MiniMax M3 vs Qwen3.7 Max Qwen3.7 Max vs MiMo-V2.5-Pro MiMo-V2.5-Pro vs Grok 4.3

Last updated: July 23, 2026 · BenchLM version AA-Omniscience Hallucination Rate 2026

Choose a model with this week’s evidence

Join 2,000+ readers for ranking moves, pricing changes, and the claims that still need proof.

One email each week. Unsubscribe anytime.