大模型能力排行榜


模型作者全球平均推理平均编程平均智能体编程平均数学平均数据分析平均语言平均指令遵循平均
Claude 4.6 Opus Thinking High EffortAnthropic76.3388.6778.1861.6789.3269.8983.2763.31
Claude 4.5 Opus Thinking High EffortAnthropic75.9680.0979.6563.3390.3974.4481.2662.55
GPT-5.2 HighOpenAI74.8483.2176.0751.6793.1778.1679.8161.77
GPT-5.2 CodexOpenAI74.3077.7183.6251.6788.7778.2073.6866.45
GPT-5.1 Codex Max HighOpenAI73.9883.6580.6853.3383.2270.1276.4870.38
Gemini 3 Pro Preview HighGoogle73.3977.4274.6055.0081.8474.3984.6265.85
Gemini 3 Flash Preview HighGoogle72.4074.5573.9040.0084.1774.7784.5674.86
GPT-5.1 HighOpenAI72.0478.7972.4953.3386.9069.6179.2663.90
GPT-5 ProOpenAI70.4881.6972.1151.6786.1757.0480.6963.96
Kimi K2.5 ThinkingMoonshot AI69.0775.9677.8648.3384.8761.3677.6757.41
GPT-5.1 CodexOpenAI68.6181.9871.7853.3379.5860.7569.4863.39
Claude Sonnet 4.5 ThinkingAnthropic68.1977.5980.3653.3379.3156.9776.4553.35
GPT-5 Mini HighOpenAI65.9168.3268.2046.6782.2055.2075.5265.27
DeepSeek V3.2 ThinkingDeepSeek62.2077.1764.6240.0085.0350.0070.4148.19
Grok 4xAI62.0279.1373.1330.0083.0263.3876.3929.07
Claude 4.1 Opus ThinkingAnthropic61.8172.3374.6648.3373.1948.9872.7642.40
Kimi K2 ThinkingMoonshot AI61.5963.4967.4438.3381.1052.2966.4562.03
Claude Haiku 4.5 ThinkingAnthropic61.3261.6872.8141.6777.5359.3066.4549.78
Claude 4 Sonnet ThinkingAnthropic61.2769.0177.4840.0070.5054.6372.9144.34
GPT-5.1 Codex MiniOpenAI60.3864.7169.9340.0076.2649.7063.0159.02
Grok 4.1 FastxAI59.9980.2069.6131.6783.7252.2474.3328.20
Claude 4.5 Opus Medium EffortAnthropic59.1053.2178.5163.3366.3245.5478.6628.11
DeepSeek V3.2 Exp ThinkingDeepSeek58.9064.3770.0631.6782.4051.5071.0641.27
Gemini 2.5 Pro (Max Thinking)Google58.3370.8175.6933.3368.3251.6275.5033.07
GLM 4.7Z.AI58.0959.7373.1341.6776.0255.1765.2335.66
GLM 4.6Z.AI55.1962.0671.0235.0081.1351.9558.9926.19
Claude 4.1 OpusAnthropic54.4540.8976.0753.3362.8345.3876.7525.92
Claude Sonnet 4.5Anthropic53.6942.2976.0748.3362.6247.0076.0023.52
Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google53.0951.4567.5023.3375.3560.9865.3427.68
Qwen 3 235B A22B Thinking 2507Alibaba52.9759.4068.976.6773.3952.1869.5240.64
DeepSeek V3.2DeepSeek51.8444.2575.6946.6763.9545.0364.2423.06
Claude 4 SonnetAnthropic50.9839.6780.7438.3360.3644.0771.0122.68
Qwen 3 Next 80B A3B ThinkingAlibaba50.4158.1660.668.3374.2653.5856.3141.54
DeepSeek V3.2 ExpDeepSeek49.8545.5073.1936.6764.3844.2665.6019.33
GPT-5.2 No ThinkingOpenAI48.9142.8076.4540.0058.2547.6849.9727.20
Qwen 3 235B A22B Instruct 2507Alibaba48.8458.4369.6113.3368.0344.7266.0721.72
GPT-5 Nano HighOpenAI48.6240.2962.3923.3368.4143.4146.8455.70
Qwen 3 Next 80B A3B InstructAlibaba48.3554.7568.2010.0070.1849.7866.3419.19
Kimi K2 InstructMoonshot AI48.1042.2374.2831.6758.1543.3466.6920.36
Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google47.7444.6466.0316.6768.7547.3162.2728.50
GPT OSS 120bOpenAI46.0939.2160.2116.6768.8738.8048.5950.29
Claude Haiku 4.5Anthropic45.3333.9472.1733.3357.9745.1357.0517.75
Grok Code FastxAI45.1342.3064.4433.3356.0148.9948.5622.27
Qwen 3 32BAlibaba43.5648.2566.033.3367.4446.5455.5417.77
GPT-5.1 No ThinkingOpenAI42.6526.8177.4828.3344.5144.0753.8423.50
Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google42.5643.3466.415.0061.0447.0451.9823.08
Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google42.3936.1665.391.6764.9047.8852.6028.11
Devstral 2Mistral41.2427.7466.7943.3352.5239.1445.6713.50
GLM 4.6VZ.AI40.0737.2264.243.3362.5046.4149.7417.06
Qwen 3 30B A3BAlibaba39.0136.6848.881.6765.3544.9254.4721.11
Grok 4.1 Fast (Non-Reasoning)xAI33.4523.3554.2610.0038.9240.6150.0116.98
Trinity Large PreviewArcee32.7420.6165.653.3344.9340.3342.1512.19
Rocky Rocky
大约 2 个月前
5

扫描下方二维码手机阅读文章