AI 大模型排名 ArtificialAnalysis AI 大模型排行榜
本页面排行数据来自 Artificial Analysis ,它对超过 100 个 AI 模型(LLM)的性能进行了比较和排名,评估指标包括智能程度、价格等。另外排行也汇总其他权威 AI 基准测试结果以供参考。
AI 大模型排名(基于 Artificial Analysis)
| 模型信息 | Artificial Analysis测试基准结果 | 其他 AI 测试基准结果 | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 排名 | 模型名称 | 机构 | 综合指数 | Coding | Math | 价格 ($/1M) | MMLU Pro ? | GPQA ? | HLE ? | LiveCodeBench ? | SciCode ? | Math 500 ? | AIME ? |
| 1 | Gemini 3 Pro Preview (high) | 72.8 | 62.3 | 95.7 | $4.5 | 0.898 | 0.908 | 0.372 | 0.917 | 0.561 | - | - | |
| 2 | Claude Opus 4.5 (Reasoning) | Anthropic | 69.8 | 60.2 | 91.3 | $10 | 0.895 | 0.866 | 0.284 | 0.871 | 0.495 | - | - |
| 3 | GPT-5.1 (high) | OpenAI | 69.7 | 57.5 | 94 | $3.438 | 0.87 | 0.873 | 0.265 | 0.868 | 0.433 | - | - |
| 4 | GPT-5 (high) | OpenAI | 68.5 | 52.7 | 94.3 | $3.438 | 0.871 | 0.854 | 0.265 | 0.846 | 0.429 | 0.994 | 0.957 |
| 5 | GPT-5 Codex (high) | OpenAI | 68.5 | 53.5 | 98.7 | $3.438 | 0.865 | 0.837 | 0.256 | 0.84 | 0.409 | - | - |
| 6 | Kimi K2 Thinking | Moonshot AI | 67 | 52.2 | 94.7 | $1.075 | 0.848 | 0.838 | 0.223 | 0.853 | 0.424 | - | - |
| 7 | GPT-5.1 Codex (high) | OpenAI | 66.9 | 52.5 | 95.7 | $3.438 | 0.86 | 0.86 | 0.234 | 0.849 | 0.402 | - | - |
| 8 | GPT-5 (medium) | OpenAI | 66.4 | 49.2 | 91.7 | $3.438 | 0.867 | 0.842 | 0.235 | 0.703 | 0.411 | 0.991 | 0.917 |
| 9 | DeepSeek V3.2 (Reasoning) | DeepSeek | 65.9 | 52.8 | 92 | $0.315 | 0.862 | 0.84 | 0.222 | 0.862 | 0.389 | - | - |
| 10 | o3 | OpenAI | 65.5 | 52.2 | 88.3 | $3.5 | 0.853 | 0.827 | 0.2 | 0.808 | 0.41 | 0.992 | 0.903 |
| 11 | Grok 4 | xAI | 65.3 | 55.1 | 92.7 | $6 | 0.866 | 0.877 | 0.239 | 0.819 | 0.457 | 0.99 | 0.943 |
| 12 | o3-pro | OpenAI | 65.3 | - | - | $35 | - | 0.845 | - | - | - | - | - |
| 13 | Gemini 3 Pro Preview (low) | 64.5 | 55.8 | 86.7 | $4.5 | 0.895 | 0.887 | 0.276 | 0.857 | 0.499 | - | - | |
| 14 | GPT-5 mini (high) | OpenAI | 64.3 | 51.4 | 90.7 | $0.688 | 0.837 | 0.828 | 0.197 | 0.838 | 0.392 | - | - |
| 15 | Grok 4.1 Fast (Reasoning) | xAI | 64.1 | 49.7 | 89.3 | $0.275 | 0.854 | 0.853 | 0.176 | 0.822 | 0.442 | - | - |
| 16 | Claude 4.5 Sonnet (Reasoning) | Anthropic | 62.7 | 49.8 | 88 | $6 | 0.875 | 0.834 | 0.173 | 0.714 | 0.447 | - | - |
| 17 | Nova 2.0 Pro Preview (medium) | Amazon | 62.4 | 46.1 | 89 | $3.438 | 0.83 | 0.785 | 0.089 | 0.73 | 0.427 | - | - |
| 18 | GPT-5.1 Codex mini (high) | OpenAI | 62.3 | 52.5 | 91.7 | $0.688 | 0.82 | 0.813 | 0.169 | 0.836 | 0.426 | - | - |
| 19 | GPT-5 (low) | OpenAI | 61.8 | 46.8 | 83 | $3.438 | 0.86 | 0.808 | 0.184 | 0.763 | 0.391 | 0.987 | 0.83 |
| 20 | MiniMax-M2 | MiniMax | 61.4 | 47.6 | 78.3 | $0.525 | 0.82 | 0.777 | 0.125 | 0.826 | 0.361 | - | - |
| 21 | GPT-5 mini (medium) | OpenAI | 60.8 | 45.7 | 85 | $0.688 | 0.828 | 0.803 | 0.146 | 0.692 | 0.41 | - | - |
| 22 | gpt-oss-120B (high) | OpenAI | 60.5 | 49.6 | 93.4 | $0.263 | 0.808 | 0.782 | 0.185 | 0.878 | 0.389 | - | - |
| 23 | Grok 4 Fast (Reasoning) | xAI | 60.3 | 48.4 | 89.7 | $0.275 | 0.85 | 0.847 | 0.17 | 0.832 | 0.442 | - | - |
| 24 | Claude Opus 4.5 (Non-reasoning) | Anthropic | 59.9 | 53 | 62.7 | $10 | 0.889 | 0.81 | 0.129 | 0.738 | 0.47 | - | - |
| 25 | Gemini 2.5 Pro | 59.6 | 49.3 | 87.7 | $3.438 | 0.862 | 0.844 | 0.211 | 0.801 | 0.428 | 0.967 | 0.887 | |
| 26 | o4-mini (high) | OpenAI | 59.6 | 48.9 | 90.7 | $1.925 | 0.832 | 0.784 | 0.175 | 0.859 | 0.465 | 0.989 | 0.94 |
| 27 | Claude 4.1 Opus (Reasoning) | Anthropic | 59.3 | 46.1 | 80.3 | $30 | 0.88 | 0.809 | 0.119 | 0.654 | 0.409 | - | - |
| 28 | DeepSeek V3.2 Speciale | DeepSeek | 58.6 | 55.4 | 96.7 | $0.315 | 0.863 | 0.871 | 0.261 | 0.896 | 0.44 | - | - |
| 29 | Nova 2.0 Lite (medium) | Amazon | 57.7 | 39.8 | 88.7 | $0.85 | 0.813 | 0.768 | 0.086 | 0.663 | 0.368 | - | - |
| 30 | DeepSeek V3.1 Terminus (Reasoning) | DeepSeek | 57.7 | 49.6 | 89.7 | $0.8 | 0.851 | 0.792 | 0.152 | 0.798 | 0.406 | - | - |
| 31 | Nova 2.0 Pro Preview (low) | Amazon | 57.6 | 39.6 | 63.3 | $3.438 | 0.822 | 0.751 | 0.052 | 0.638 | 0.387 | - | - |
| 32 | Qwen3 235B A22B 2507 (Reasoning) | Alibaba | 57.5 | 44.6 | 91 | $2.625 | 0.843 | 0.79 | 0.15 | 0.788 | 0.424 | 0.984 | 0.94 |
| 33 | Grok 3 mini Reasoning (high) | xAI | 57.1 | 42.2 | 84.7 | $0.35 | 0.828 | 0.791 | 0.111 | 0.696 | 0.406 | 0.992 | 0.933 |
| 34 | Doubao Seed Code | ByteDance Seed | 57.1 | 47.4 | 79.3 | $0.407 | 0.854 | 0.764 | 0.133 | 0.766 | 0.407 | - | - |
| 35 | DeepSeek V3.2 Exp (Reasoning) | DeepSeek | 56.9 | 48.6 | 87.7 | $0.315 | 0.85 | 0.797 | 0.138 | 0.789 | 0.377 | - | - |
| 36 | Claude 4 Sonnet (Reasoning) | Anthropic | 56.5 | 45.1 | 74.3 | $6 | 0.842 | 0.777 | 0.096 | 0.655 | 0.4 | 0.991 | 0.773 |
| 37 | GLM-4.6 (Reasoning) | Z AI | 56 | 43.8 | 86 | $1 | 0.829 | 0.78 | 0.133 | 0.695 | 0.384 | - | - |
| 38 | Nova 2.0 Omni (medium) | Amazon | 56 | 35.5 | 89.7 | $0.85 | 0.809 | 0.76 | 0.068 | 0.66 | 0.362 | - | - |
| 39 | Qwen3 Max Thinking | Alibaba | 55.8 | 36.2 | 82.3 | $2.4 | 0.824 | 0.776 | 0.12 | 0.535 | 0.387 | - | - |
| 40 | Qwen3 Max | Alibaba | 55.1 | 44.7 | 80.7 | $2.4 | 0.841 | 0.764 | 0.111 | 0.767 | 0.383 | - | - |
| 41 | Claude 4.5 Haiku (Reasoning) | Anthropic | 54.6 | 43.4 | 83.7 | $2 | 0.76 | 0.672 | 0.097 | 0.615 | 0.433 | - | - |
| 42 | Gemini 2.5 Flash Preview (Sep '25) (Reasoning) | 54.4 | 42.5 | 78.3 | $0.85 | 0.842 | 0.793 | 0.127 | 0.713 | 0.405 | - | - | |
| 43 | Qwen3 VL 235B A22B (Reasoning) | Alibaba | 54.4 | 38.4 | 88.3 | $2.625 | 0.836 | 0.772 | 0.101 | 0.646 | 0.399 | - | - |
| 44 | Qwen3 Next 80B A3B (Reasoning) | Alibaba | 54.3 | 42.1 | 84.3 | $1.875 | 0.824 | 0.759 | 0.117 | 0.784 | 0.388 | - | - |
| 45 | Claude 4 Opus (Reasoning) | Anthropic | 54.2 | 44.2 | 73.3 | $30 | 0.873 | 0.796 | 0.117 | 0.636 | 0.398 | 0.982 | 0.757 |
| 46 | Gemini 2.5 Pro Preview (Mar' 25) | 54.1 | 46.7 | - | $3.438 | 0.858 | 0.836 | 0.171 | 0.778 | 0.395 | 0.98 | 0.87 | |
| 47 | DeepSeek V3.1 (Reasoning) | DeepSeek | 54 | 47.2 | 89.7 | $0.654 | 0.851 | 0.779 | 0.13 | 0.784 | 0.391 | - | - |
| 48 | Gemini 2.5 Pro Preview (May' 25) | 53.2 | - | - | $3.438 | 0.837 | 0.822 | 0.154 | 0.77 | 0.416 | 0.986 | 0.843 | |
| 49 | DeepSeek V3.2 (Non-reasoning) | DeepSeek | 52.4 | 42.8 | 59 | $0.315 | 0.837 | 0.751 | 0.105 | 0.593 | 0.387 | - | - |
| 50 | gpt-oss-20B (high) | OpenAI | 52.1 | 40.7 | 89.3 | $0.1 | 0.748 | 0.688 | 0.098 | 0.777 | 0.344 | - | - |
| 51 | Magistral Medium 1.2 | Mistral | 52 | 42.3 | 82 | $2.75 | 0.815 | 0.739 | 0.096 | 0.75 | 0.392 | - | - |
| 52 | DeepSeek R1 0528 (May '25) | DeepSeek | 52 | 44.1 | 76 | $2.362 | 0.849 | 0.813 | 0.149 | 0.77 | 0.403 | 0.983 | 0.893 |
| 53 | Qwen3 VL 32B (Reasoning) | Alibaba | 51.9 | 36.4 | 84.7 | $2.625 | 0.818 | 0.733 | 0.096 | 0.738 | 0.285 | - | - |
| 54 | Seed-OSS-36B-Instruct | ByteDance Seed | 51.6 | 39.8 | 84.7 | $0.3 | 0.815 | 0.726 | 0.091 | 0.765 | 0.365 | - | - |
| 55 | Apriel-v1.5-15B-Thinker | ServiceNow | 51.6 | 39.2 | 87.5 | $0 | 0.773 | 0.713 | 0.12 | 0.728 | 0.348 | - | - |
| 56 | GLM-4.5 (Reasoning) | Z AI | 51.3 | 43.3 | 73.7 | $0.98 | 0.835 | 0.782 | 0.122 | 0.738 | 0.348 | 0.979 | 0.873 |
| 57 | Gemini 2.5 Flash (Reasoning) | 51.2 | 40.5 | 73.3 | $0.85 | 0.832 | 0.79 | 0.111 | 0.695 | 0.394 | 0.981 | 0.823 | |
| 58 | GPT-5 nano (high) | OpenAI | 51 | 42.3 | 83.7 | $0.138 | 0.78 | 0.676 | 0.082 | 0.789 | 0.366 | - | - |
| 59 | o3-mini (high) | OpenAI | 50.8 | 42.1 | - | $1.925 | 0.802 | 0.773 | 0.123 | 0.734 | 0.398 | 0.985 | 0.86 |
| 60 | Kimi K2 0905 | Moonshot AI | 50.4 | 38.1 | 57.3 | $1.2 | 0.819 | 0.767 | 0.063 | 0.61 | 0.307 | - | - |
| 61 | Claude 3.7 Sonnet (Reasoning) | Anthropic | 49.9 | 35.8 | 56.3 | $6 | 0.837 | 0.772 | 0.103 | 0.473 | 0.403 | 0.947 | 0.487 |
| 62 | Claude 4.5 Sonnet (Non-reasoning) | Anthropic | 49.6 | 42.9 | 37 | $6 | 0.86 | 0.727 | 0.071 | 0.59 | 0.428 | - | - |
| 63 | GPT-5 nano (medium) | OpenAI | 49.3 | 42.1 | 78.3 | $0.138 | 0.772 | 0.67 | 0.076 | 0.763 | 0.338 | - | - |
| 64 | GLM-4.5-Air | Z AI | 48.8 | 39.4 | 80.7 | $0.425 | 0.815 | 0.733 | 0.068 | 0.684 | 0.306 | 0.965 | 0.673 |
| 65 | Nova 2.0 Omni (low) | Amazon | 48.7 | 32.3 | 56 | $0.85 | 0.798 | 0.699 | 0.04 | 0.592 | 0.343 | - | - |
| 66 | Grok Code Fast 1 | xAI | 48.6 | 39.4 | 43.3 | $0.525 | 0.793 | 0.727 | 0.075 | 0.657 | 0.362 | - | - |
| 67 | Qwen3 Max (Preview) | Alibaba | 48.5 | 40.2 | 75 | $2.4 | 0.838 | 0.764 | 0.093 | 0.651 | 0.37 | - | - |
| 68 | o3-mini | OpenAI | 48.1 | 39.4 | - | $1.925 | 0.791 | 0.748 | 0.087 | 0.717 | 0.399 | 0.973 | 0.77 |
| 69 | Kimi K2 | Moonshot AI | 48.1 | 35 | 57 | $1.075 | 0.824 | 0.766 | 0.07 | 0.556 | 0.345 | 0.971 | 0.693 |
| 70 | o1-pro | OpenAI | 48 | - | - | $262.5 | - | - | - | - | - | - | - |
| 71 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | 47.9 | 36.5 | 68.7 | $0.175 | 0.808 | 0.709 | 0.066 | 0.688 | 0.287 | - | - | |
| 72 | gpt-oss-120B (low) | OpenAI | 47.5 | 37.2 | 66.7 | $0.263 | 0.775 | 0.672 | 0.052 | 0.707 | 0.36 | - | - |
| 73 | o1 | OpenAI | 47.2 | 38.6 | - | $26.25 | 0.841 | 0.747 | 0.077 | 0.679 | 0.358 | 0.97 | 0.723 |
| 74 | Nova 2.0 Lite (low) | Amazon | 46.8 | 27.9 | 46.7 | $0.85 | 0.788 | 0.698 | 0.042 | 0.469 | 0.333 | - | - |
| 75 | Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) | 46.7 | 37.8 | 56.7 | $0.85 | 0.836 | 0.766 | 0.078 | 0.625 | 0.375 | - | - | |
| 76 | Qwen3 30B A3B 2507 (Reasoning) | Alibaba | 46.4 | 36.3 | 56.3 | $0.75 | 0.805 | 0.707 | 0.098 | 0.707 | 0.333 | 0.976 | 0.907 |
| 77 | DeepSeek V3.2 Exp (Non-reasoning) | DeepSeek | 46.3 | 39.6 | 57.7 | $0.315 | 0.836 | 0.738 | 0.086 | 0.554 | 0.399 | - | - |
| 78 | Sonar Reasoning Pro | Perplexity | 46.3 | - | - | $0 | - | - | - | - | - | 0.957 | 0.79 |
| 79 | MiniMax M1 80k | MiniMax | 46.2 | 37.1 | 61 | $0.825 | 0.816 | 0.697 | 0.082 | 0.711 | 0.374 | 0.98 | 0.847 |
| 80 | Gemini 2.5 Flash Preview (Reasoning) | 45.8 | - | - | $0 | 0.8 | 0.698 | 0.116 | 0.505 | 0.359 | 0.981 | 0.843 | |
| 81 | DeepSeek V3.1 Terminus (Non-reasoning) | DeepSeek | 45.7 | 38.3 | 53.7 | $0.8 | 0.836 | 0.751 | 0.084 | 0.529 | 0.321 | - | - |
| 82 | Qwen3 235B A22B 2507 Instruct | Alibaba | 45.3 | 34.2 | 71.7 | $1.225 | 0.828 | 0.753 | 0.106 | 0.524 | 0.36 | 0.98 | 0.717 |
| 83 | Qwen3 VL 30B A3B (Reasoning) | Alibaba | 45.3 | 34.5 | 82.3 | $0.75 | 0.807 | 0.72 | 0.087 | 0.697 | 0.288 | - | - |
| 84 | Grok 3 | xAI | 45.3 | 30 | 58 | $6 | 0.799 | 0.693 | 0.051 | 0.425 | 0.368 | 0.87 | 0.33 |
| 85 | Llama Nemotron Super 49B v1.5 (Reasoning) | NVIDIA | 45.2 | 37.8 | 76.7 | $0.175 | 0.814 | 0.748 | 0.068 | 0.737 | 0.348 | 0.983 | 0.86 |
| 86 | o1-preview | OpenAI | 44.9 | 34 | - | $28.875 | - | - | - | - | - | 0.924 | - |
| 87 | Qwen3 Next 80B A3B Instruct | Alibaba | 44.8 | 35.4 | 66.3 | $0.875 | 0.819 | 0.738 | 0.073 | 0.684 | 0.307 | - | - |
| 88 | Ling-1T | InclusionAI | 44.8 | 37.6 | 71.3 | $0.998 | 0.822 | 0.719 | 0.072 | 0.677 | 0.352 | - | - |
| 89 | DeepSeek V3.1 (Non-reasoning) | DeepSeek | 44.8 | 39 | 49.7 | $0.84 | 0.833 | 0.735 | 0.063 | 0.577 | 0.367 | - | - |
| 90 | GLM-4.6 (Non-reasoning) | Z AI | 44.7 | 38.7 | 44.3 | $1 | 0.784 | 0.632 | 0.052 | 0.561 | 0.331 | - | - |
| 91 | Claude 4.1 Opus (Non-reasoning) | Anthropic | 44.6 | - | - | $30 | - | - | - | - | - | - | - |
| 92 | Claude 4 Sonnet (Non-reasoning) | Anthropic | 44.4 | 35.9 | 38 | $6 | 0.837 | 0.683 | 0.04 | 0.449 | 0.373 | 0.934 | 0.407 |
| 93 | gpt-oss-20B (low) | OpenAI | 44.3 | 34.5 | 62.3 | $0.1 | 0.718 | 0.611 | 0.051 | 0.652 | 0.34 | - | - |
| 94 | Qwen3 VL 235B A22B Instruct | Alibaba | 44.1 | 33.9 | 70.7 | $1.225 | 0.823 | 0.712 | 0.063 | 0.594 | 0.359 | - | - |
| 95 | DeepSeek R1 (Jan '25) | DeepSeek | 43.8 | 34.4 | 68 | $2.362 | 0.844 | 0.708 | 0.093 | 0.617 | 0.357 | 0.966 | 0.683 |
| 96 | GPT-5 (minimal) | OpenAI | 43.5 | 37.4 | 31.7 | $3.438 | 0.806 | 0.673 | 0.054 | 0.558 | 0.388 | 0.861 | 0.367 |
| 97 | Qwen3 4B 2507 (Reasoning) | Alibaba | 43.4 | 30.4 | 82.7 | $0 | 0.743 | 0.667 | 0.059 | 0.641 | 0.256 | - | - |
| 98 | GPT-4.1 | OpenAI | 43.4 | 32.2 | 34.7 | $3.5 | 0.806 | 0.666 | 0.046 | 0.457 | 0.381 | 0.913 | 0.437 |
| 99 | KAT-Coder-Pro V1 | KwaiKAT | 43.3 | 33.2 | 65 | $0 | 0.814 | 0.709 | 0.071 | 0.534 | 0.355 | - | - |
| 100 | Magistral Small 1.2 | Mistral | 43 | 37.2 | 80.3 | $0.75 | 0.768 | 0.663 | 0.061 | 0.723 | 0.352 | - | - |
| 101 | GPT-5.1 (Non-reasoning) | OpenAI | 42.9 | 35.7 | 38 | $3.438 | 0.801 | 0.643 | 0.052 | 0.494 | 0.365 | - | - |
| 102 | EXAONE 4.0 32B (Reasoning) | LG AI Research | 42.6 | 37.5 | 80 | $0.7 | 0.818 | 0.739 | 0.105 | 0.747 | 0.344 | 0.977 | 0.843 |
| 103 | GPT-4.1 mini | OpenAI | 42.5 | 31.9 | 46.3 | $0.7 | 0.781 | 0.664 | 0.046 | 0.483 | 0.404 | 0.925 | 0.43 |
| 104 | Claude 4 Opus (Non-reasoning) | Anthropic | 42.3 | - | 36.3 | $30 | 0.86 | 0.701 | 0.059 | 0.542 | 0.409 | 0.941 | 0.563 |
| 105 | Qwen3 Coder 480B A35B Instruct | Alibaba | 42.3 | 37.4 | 39.3 | $3 | 0.788 | 0.618 | 0.044 | 0.585 | 0.359 | 0.942 | 0.477 |
| 106 | Nova 2.0 Pro Preview (Non-reasoning) | Amazon | 41.9 | 30.3 | 30.7 | $3.438 | 0.772 | 0.636 | 0.04 | 0.473 | 0.281 | - | - |
| 107 | GPT-5 (ChatGPT) | OpenAI | 41.8 | 34.7 | 48.3 | $3.438 | 0.82 | 0.686 | 0.058 | 0.543 | 0.378 | - | - |
| 108 | Ring-1T | InclusionAI | 41.8 | 35.8 | 89.3 | $0.998 | 0.806 | 0.595 | 0.102 | 0.643 | 0.367 | - | - |
| 109 | Qwen3 235B A22B (Reasoning) | Alibaba | 41.7 | 35.9 | 82 | $2.625 | 0.828 | 0.7 | 0.117 | 0.622 | 0.399 | 0.93 | 0.84 |
| 110 | Claude 4.5 Haiku (Non-reasoning) | Anthropic | 41.7 | 37 | 39 | $2 | 0.8 | 0.646 | 0.043 | 0.511 | 0.344 | - | - |
| 111 | GPT-5 mini (minimal) | OpenAI | 41.6 | 35 | 46.7 | $0.688 | 0.775 | 0.687 | 0.05 | 0.545 | 0.369 | - | - |
| 112 | Hermes 4 - Llama-3.1 405B (Reasoning) | Nous Research | 41.6 | 34.8 | 69.7 | $1.5 | 0.829 | 0.727 | 0.103 | 0.686 | 0.252 | - | - |
| 113 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | 41.6 | 33.2 | 46.7 | $0.175 | 0.796 | 0.651 | 0.046 | 0.641 | 0.285 | - | - | |
| 114 | Grok 3 Reasoning Beta | xAI | 41.4 | - | - | $0 | - | - | - | - | - | - | - |
| 115 | DeepSeek V3 0324 | DeepSeek | 41.3 | 30.2 | 41 | $1.25 | 0.819 | 0.655 | 0.052 | 0.405 | 0.358 | 0.942 | 0.52 |
| 116 | Claude 3.7 Sonnet (Non-reasoning) | Anthropic | 41.1 | 32.3 | 21 | $6 | 0.803 | 0.656 | 0.048 | 0.394 | 0.376 | 0.85 | 0.223 |
| 117 | Qwen3 VL 32B Instruct | Alibaba | 41 | 29.8 | 68.3 | $1.225 | 0.791 | 0.671 | 0.063 | 0.514 | 0.301 | - | - |
| 118 | Gemini 2.5 Flash (Non-reasoning) | 40.4 | 30 | 60.3 | $0.85 | 0.809 | 0.683 | 0.051 | 0.495 | 0.291 | 0.932 | 0.5 | |
| 119 | Gemini 2.5 Flash-Lite (Reasoning) | 40.1 | 27.6 | 53.3 | $0.175 | 0.759 | 0.625 | 0.064 | 0.593 | 0.193 | 0.969 | 0.703 | |
| 120 | Qwen3 Omni 30B A3B (Reasoning) | Alibaba | 40 | 34 | 74 | $0.43 | 0.792 | 0.726 | 0.073 | 0.679 | 0.306 | - | - |
| 121 | MiniMax M1 40k | MiniMax | 40 | 35.2 | 13.7 | $0.825 | 0.808 | 0.682 | 0.075 | 0.657 | 0.378 | 0.972 | 0.813 |
| 122 | Ring-flash-2.0 | InclusionAI | 39.5 | 28.9 | 83.7 | $0.247 | 0.793 | 0.725 | 0.089 | 0.628 | 0.168 | - | - |
| 123 | o1-mini | OpenAI | 39.2 | - | - | $0 | 0.742 | 0.603 | 0.049 | 0.576 | 0.323 | 0.944 | 0.603 |
| 124 | Hermes 4 - Llama-3.1 70B (Reasoning) | Nous Research | 39.2 | 34.6 | 68.7 | $0.198 | 0.811 | 0.699 | 0.079 | 0.653 | 0.341 | - | - |
| 125 | Qwen3 32B (Reasoning) | Alibaba | 38.7 | 30.9 | 73 | $2.625 | 0.798 | 0.668 | 0.083 | 0.546 | 0.354 | 0.961 | 0.807 |
| 126 | Grok 4 Fast (Non-reasoning) | xAI | 38.6 | 28.1 | 41.3 | $0.275 | 0.73 | 0.606 | 0.05 | 0.401 | 0.329 | - | - |
| 127 | Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | NVIDIA | 38.5 | 33.7 | 63.7 | $0.9 | 0.825 | 0.728 | 0.081 | 0.641 | 0.347 | 0.952 | 0.747 |
| 128 | Qwen3 VL 30B A3B Instruct | Alibaba | 38.5 | 28 | 72.3 | $0.35 | 0.764 | 0.695 | 0.064 | 0.476 | 0.308 | - | - |
| 129 | GPT-4.5 (Preview) | OpenAI | 38.4 | - | - | $0 | - | - | - | - | - | - | - |
| 130 | Mistral Large 3 | Mistral | 38.4 | 32.5 | 38 | $0.75 | 0.807 | 0.68 | 0.041 | 0.465 | 0.362 | - | - |
| 131 | Ling-flash-2.0 | InclusionAI | 38.3 | 32.6 | 65.3 | $0.247 | 0.777 | 0.657 | 0.063 | 0.589 | 0.289 | - | - |
| 132 | Grok 4.1 Fast (Non-reasoning) | xAI | 38.3 | 27.7 | 34.3 | $0.275 | 0.743 | 0.637 | 0.05 | 0.399 | 0.296 | - | - |
| 133 | QwQ 32B | Alibaba | 37.9 | - | 29 | $0.473 | 0.764 | 0.593 | 0.082 | 0.631 | 0.358 | 0.957 | 0.78 |
| 134 | Gemini 2.0 Flash Thinking Experimental (Jan '25) | 37.7 | 24.1 | - | $0 | 0.798 | 0.701 | 0.071 | 0.321 | 0.329 | 0.944 | 0.5 | |
| 135 | Solar Pro 2 (Reasoning) | Upstage | 37.7 | 31.5 | 61.3 | $0.5 | 0.805 | 0.687 | 0.07 | 0.616 | 0.302 | 0.967 | 0.69 |
| 136 | NVIDIA Nemotron Nano 9B V2 (Reasoning) | NVIDIA | 37.2 | 31.9 | 69.7 | $0.07 | 0.742 | 0.57 | 0.046 | 0.724 | 0.22 | - | - |
| 137 | Qwen3 30B A3B 2507 Instruct | Alibaba | 37 | 29.2 | 66.3 | $0.35 | 0.777 | 0.659 | 0.068 | 0.515 | 0.304 | 0.975 | 0.727 |
| 138 | GLM-4.5V (Reasoning) | Z AI | 37 | 29.2 | 73 | $0.85 | 0.788 | 0.684 | 0.059 | 0.604 | 0.221 | - | - |
| 139 | Qwen3 30B A3B (Reasoning) | Alibaba | 36.7 | 27.1 | 72.3 | $0.75 | 0.777 | 0.616 | 0.066 | 0.506 | 0.285 | 0.959 | 0.753 |
| 140 | OLMo 3 32B Think | Allen Institute for AI | 36.3 | 32.4 | 73.7 | $0.237 | 0.759 | 0.61 | 0.059 | 0.672 | 0.286 | - | - |
| 141 | NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | NVIDIA | 36.1 | 30.6 | 62.3 | $0.07 | 0.739 | 0.557 | 0.04 | 0.701 | 0.209 | - | - |
| 142 | Solar Pro 2 (Preview) (Reasoning) | Upstage | 36.1 | - | - | $0 | 0.768 | 0.578 | 0.057 | 0.462 | 0.164 | 0.9 | 0.663 |
| 143 | Qwen3 14B (Reasoning) | Alibaba | 36 | 29.1 | 55.7 | $1.313 | 0.774 | 0.604 | 0.043 | 0.523 | 0.316 | 0.961 | 0.763 |
| 144 | Llama 4 Maverick | Meta | 35.8 | 26.4 | 19.3 | $0.422 | 0.809 | 0.671 | 0.048 | 0.397 | 0.331 | 0.889 | 0.39 |
| 145 | GPT-4o (March 2025, chatgpt-4o-latest) | OpenAI | 35.6 | - | 25.7 | $7.5 | 0.803 | 0.655 | 0.05 | 0.425 | 0.366 | 0.893 | 0.327 |
| 146 | Nova 2.0 Lite (Non-reasoning) | Amazon | 35.6 | 21.6 | 33.7 | $0.85 | 0.743 | 0.603 | 0.03 | 0.346 | 0.24 | - | - |
| 147 | Llama 3.3 Nemotron Super 49B v1 (Reasoning) | NVIDIA | 35.5 | 18.7 | 54.7 | $0 | 0.785 | 0.643 | 0.065 | 0.277 | 0.282 | 0.959 | 0.583 |
| 148 | Mistral Medium 3.1 | Mistral | 35.4 | 28.1 | 38.3 | $0.8 | 0.683 | 0.588 | 0.044 | 0.406 | 0.338 | - | - |
| 149 | Gemini 2.0 Pro Experimental (Feb '25) | 34.6 | 25.5 | - | $0 | 0.805 | 0.622 | 0.068 | 0.347 | 0.312 | 0.923 | 0.36 | |
| 150 | Sonar Reasoning | Perplexity | 34.2 | - | - | $2 | - | 0.623 | - | - | - | 0.921 | 0.77 |
| 151 | Nova 2.0 Omni (Non-reasoning) | Amazon | 34.1 | 21.6 | 37 | $0.85 | 0.719 | 0.555 | 0.039 | 0.305 | 0.279 | - | - |
| 152 | Gemini 2.5 Flash Preview (Non-reasoning) | 34.1 | - | - | $0 | 0.783 | 0.594 | 0.05 | 0.406 | 0.233 | 0.926 | 0.433 | |
| 153 | Gemini 2.0 Flash (Feb '25) | 33.6 | 23.4 | 21.7 | $0.175 | 0.779 | 0.623 | 0.053 | 0.334 | 0.333 | 0.93 | 0.33 | |
| 154 | Mistral Medium 3 | Mistral | 33.6 | 25.6 | 30.3 | $0.8 | 0.76 | 0.578 | 0.043 | 0.4 | 0.331 | 0.907 | 0.44 |
| 155 | Qwen3 Coder 30B A3B Instruct | Alibaba | 33.4 | 27.4 | 29 | $0.9 | 0.706 | 0.516 | 0.04 | 0.403 | 0.278 | 0.893 | 0.297 |
| 156 | Magistral Medium 1 | Mistral | 33.2 | 30.3 | 40.3 | $2.75 | 0.753 | 0.679 | 0.095 | 0.527 | 0.297 | 0.917 | 0.7 |
| 157 | ERNIE 4.5 300B A47B | Baidu | 32.9 | 27.9 | 41.3 | $0.485 | 0.776 | 0.811 | 0.035 | 0.467 | 0.315 | 0.931 | 0.493 |
| 158 | DeepSeek R1 Distill Qwen 32B | DeepSeek | 32.7 | - | 63 | $0.285 | 0.739 | 0.615 | 0.055 | 0.27 | 0.376 | 0.941 | 0.687 |
| 159 | Hermes 4 - Llama-3.1 405B (Non-reasoning) | Nous Research | 32.6 | 32.8 | 15.3 | $1.5 | 0.729 | 0.536 | 0.042 | 0.546 | 0.346 | - | - |
| 160 | DeepSeek V3 (Dec '24) | DeepSeek | 32.5 | 25.9 | 26 | $0.625 | 0.752 | 0.557 | 0.036 | 0.359 | 0.354 | 0.887 | 0.253 |
| 161 | Nova Premier | Amazon | 32.3 | 22 | 17.3 | $5 | 0.733 | 0.569 | 0.047 | 0.317 | 0.279 | 0.839 | 0.17 |
| 162 | Qwen3 VL 8B (Reasoning) | Alibaba | 32.1 | 20.3 | 30.7 | $0.66 | 0.749 | 0.579 | 0.033 | 0.353 | 0.219 | - | - |
| 163 | Magistral Small 1 | Mistral | 31.9 | 26.6 | 41.3 | $0.75 | 0.746 | 0.641 | 0.072 | 0.514 | 0.241 | 0.963 | 0.713 |
| 164 | OLMo 3 7B Think | Allen Institute for AI | 31.9 | 27.9 | 70.7 | $0.14 | 0.655 | 0.516 | 0.057 | 0.617 | 0.212 | - | - |
| 165 | Gemini 2.0 Flash (experimental) | 31.8 | - | - | $0 | 0.782 | 0.636 | 0.047 | 0.21 | 0.34 | 0.911 | 0.3 | |
| 166 | DeepSeek R1 0528 Qwen3 8B | DeepSeek | 31 | 24.4 | 63.7 | $0.068 | 0.739 | 0.612 | 0.056 | 0.513 | 0.204 | 0.932 | 0.65 |
| 167 | Qwen2.5 Max | Alibaba | 30.7 | - | - | $2.8 | 0.762 | 0.587 | 0.045 | 0.359 | 0.337 | 0.835 | 0.233 |
| 168 | Ministral 14B (Dec '25) | Mistral | 30.5 | 21 | 30 | $0.2 | 0.693 | 0.572 | 0.046 | 0.351 | 0.236 | - | - |
| 169 | Qwen3 4B 2507 Instruct | Alibaba | 30.4 | 20 | 52.3 | $0 | 0.672 | 0.517 | 0.047 | 0.377 | 0.181 | - | - |
| 170 | EXAONE 4.0 32B (Non-reasoning) | LG AI Research | 30.3 | 24.6 | 39.3 | $0.7 | 0.768 | 0.628 | 0.049 | 0.472 | 0.252 | 0.939 | 0.47 |
| 171 | Qwen3 Omni 30B A3B Instruct | Alibaba | 30.2 | 20.8 | 52.3 | $0.43 | 0.725 | 0.62 | 0.051 | 0.422 | 0.186 | - | - |
| 172 | Solar Pro 2 (Non-reasoning) | Upstage | 30.2 | 23.8 | 30 | $0.5 | 0.75 | 0.561 | 0.038 | 0.424 | 0.248 | 0.889 | 0.407 |
| 173 | Gemini 2.5 Flash-Lite (Non-reasoning) | 30.1 | 19.9 | 35.3 | $0.175 | 0.724 | 0.474 | 0.037 | 0.4 | 0.177 | 0.926 | 0.5 | |
| 174 | Solar Pro 2 (Preview) (Non-reasoning) | Upstage | 30 | - | - | $0 | 0.725 | 0.544 | 0.038 | 0.385 | 0.272 | 0.871 | 0.297 |
| 175 | Gemini 1.5 Pro (Sep '24) | 30 | 23.6 | - | $0 | 0.75 | 0.589 | 0.049 | 0.316 | 0.295 | 0.876 | 0.23 | |
| 176 | Claude 3.5 Sonnet (Oct '24) | Anthropic | 29.9 | 30.2 | - | $6 | 0.772 | 0.599 | 0.039 | 0.381 | 0.366 | 0.771 | 0.157 |
| 177 | DeepSeek R1 Distill Llama 70B | DeepSeek | 29.9 | 19.7 | 53.7 | $0.875 | 0.795 | 0.402 | 0.061 | 0.266 | 0.312 | 0.935 | 0.67 |
| 178 | Qwen3 235B A22B (Non-reasoning) | Alibaba | 29.9 | 23.3 | 23.7 | $1.225 | 0.762 | 0.613 | 0.047 | 0.343 | 0.299 | 0.902 | 0.327 |
| 179 | DeepSeek R1 Distill Qwen 14B | DeepSeek | 29.7 | - | 55.7 | $0.15 | 0.74 | 0.484 | 0.044 | 0.376 | 0.239 | 0.949 | 0.667 |
| 180 | Qwen3 14B (Non-reasoning) | Alibaba | 29.2 | 19.8 | 58 | $0.613 | 0.675 | 0.47 | 0.042 | 0.28 | 0.265 | 0.871 | 0.28 |
| 181 | Mistral Small 3.2 | Mistral | 29.1 | 20.1 | 27 | $0.15 | 0.681 | 0.505 | 0.043 | 0.275 | 0.264 | 0.883 | 0.323 |
| 182 | GPT-5 nano (minimal) | OpenAI | 29.1 | 27.5 | 27.3 | $0.138 | 0.556 | 0.428 | 0.041 | 0.47 | 0.291 | - | - |
| 183 | GPT-4o (Aug '24) | OpenAI | 29 | - | - | $4.375 | - | 0.521 | 0.029 | 0.317 | - | 0.795 | 0.117 |
| 184 | Qwen2.5 Instruct 72B | Alibaba | 29 | 19.5 | 14 | $0 | 0.72 | 0.491 | 0.042 | 0.276 | 0.267 | 0.858 | 0.16 |
| 185 | Sonar | Perplexity | 28.8 | - | - | $1 | 0.689 | 0.471 | 0.073 | 0.295 | 0.229 | 0.817 | 0.487 |
| 186 | Qwen3 8B (Reasoning) | Alibaba | 28.3 | 21.8 | 19 | $0.66 | 0.743 | 0.589 | 0.042 | 0.406 | 0.226 | 0.904 | 0.747 |
| 187 | Sonar Pro | Perplexity | 28.2 | - | - | $6 | 0.755 | 0.578 | 0.079 | 0.275 | 0.226 | 0.745 | 0.29 |
| 188 | Ministral 8B (Dec '25) | Mistral | 28.2 | 18.4 | 31.7 | $0.15 | 0.642 | 0.471 | 0.043 | 0.303 | 0.208 | - | - |
| 189 | Llama 3.1 Instruct 405B | Meta | 28.1 | 22.2 | 3 | $4.188 | 0.732 | 0.515 | 0.042 | 0.305 | 0.299 | 0.703 | 0.213 |
| 190 | Llama 4 Scout | Meta | 28.1 | 16.1 | 14 | $0.241 | 0.752 | 0.587 | 0.043 | 0.299 | 0.17 | 0.844 | 0.283 |
| 191 | QwQ 32B-Preview | Alibaba | 28 | - | - | $0.135 | 0.648 | 0.557 | 0.048 | 0.337 | 0.038 | 0.91 | 0.453 |
| 192 | Devstral Medium | Mistral | 27.9 | 23.9 | 4.7 | $0.8 | 0.708 | 0.492 | 0.038 | 0.337 | 0.294 | 0.707 | 0.067 |
| 193 | Llama 3.3 Instruct 70B | Meta | 27.9 | 19.2 | 7.7 | $0.62 | 0.713 | 0.498 | 0.04 | 0.288 | 0.26 | 0.773 | 0.3 |
| 194 | Ling-mini-2.0 | InclusionAI | 27.8 | 19 | 49.3 | $0.122 | 0.671 | 0.562 | 0.05 | 0.429 | 0.135 | - | - |
| 195 | GPT-4.1 nano | OpenAI | 27.3 | 20.7 | 24 | $0.175 | 0.657 | 0.512 | 0.039 | 0.326 | 0.259 | 0.848 | 0.237 |
| 196 | Qwen3 VL 4B (Reasoning) | Alibaba | 27.3 | 16.8 | 25.7 | $0 | 0.7 | 0.494 | 0.044 | 0.32 | 0.171 | - | - |
| 197 | Devstral Small (Jul '25) | Mistral | 27.2 | 18.5 | 29.3 | $0.15 | 0.622 | 0.414 | 0.037 | 0.254 | 0.243 | 0.635 | 0.003 |
| 198 | Qwen3 VL 8B Instruct | Alibaba | 27.1 | 17.6 | 27.3 | $0.31 | 0.686 | 0.427 | 0.029 | 0.332 | 0.174 | - | - |
| 199 | GPT-4o (Nov '24) | OpenAI | 27 | 24 | 6 | $4.375 | 0.748 | 0.543 | 0.033 | 0.309 | 0.333 | 0.759 | 0.15 |
| 200 | Command A | Cohere | 26.9 | 19.2 | 13 | $4.375 | 0.712 | 0.527 | 0.046 | 0.287 | 0.281 | 0.819 | 0.097 |
| 201 | Mistral Large 2 (Nov '24) | Mistral | 26.8 | 21.4 | 14 | $3 | 0.697 | 0.486 | 0.04 | 0.293 | 0.292 | 0.736 | 0.11 |
| 202 | Gemini 2.0 Flash-Lite (Feb '25) | 26.8 | - | - | $0.131 | 0.724 | 0.535 | 0.036 | 0.185 | 0.25 | 0.873 | 0.277 | |
| 203 | Exaone 4.0 1.2B (Reasoning) | LG AI Research | 26.7 | 20.3 | 50.3 | $0 | 0.588 | 0.515 | 0.058 | 0.516 | 0.093 | - | - |
| 204 | Llama Nemotron Super 49B v1.5 (Non-reasoning) | NVIDIA | 26.6 | 18.8 | 8 | $0.175 | 0.692 | 0.481 | 0.043 | 0.29 | 0.238 | 0.77 | 0.137 |
| 205 | Qwen3 30B A3B (Non-reasoning) | Alibaba | 26.5 | 21.6 | 21.7 | $0.35 | 0.71 | 0.515 | 0.046 | 0.322 | 0.264 | 0.863 | 0.26 |
| 206 | Qwen3 32B (Non-reasoning) | Alibaba | 26.4 | - | 19.7 | $1.225 | 0.727 | 0.535 | 0.043 | 0.288 | 0.28 | 0.869 | 0.303 |
| 207 | GPT-4o (May '24) | OpenAI | 26.3 | 24.2 | - | $7.5 | 0.74 | 0.526 | 0.028 | 0.334 | 0.309 | 0.791 | 0.11 |
| 208 | Gemini 2.0 Flash-Lite (Preview) | 26.3 | - | - | $0.131 | - | 0.542 | 0.044 | 0.179 | 0.247 | 0.873 | 0.303 | |
| 209 | Kimi Linear 48B A3B Instruct | Moonshot AI | 26.1 | 22.8 | 36.3 | $0 | 0.585 | 0.412 | 0.027 | 0.378 | 0.199 | - | - |
| 210 | Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | NVIDIA | 26.1 | - | 50 | $0 | 0.556 | 0.408 | 0.051 | 0.493 | 0.101 | 0.947 | 0.707 |
| 211 | GLM-4.5V (Non-reasoning) | Z AI | 26 | 20.1 | 15.3 | $0.9 | 0.751 | 0.573 | 0.036 | 0.352 | 0.188 | - | - |
| 212 | Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | NVIDIA | 25.9 | 17 | 7.7 | $0 | 0.698 | 0.517 | 0.035 | 0.28 | 0.229 | 0.775 | 0.193 |
| 213 | Reka Flash 3 | Reka AI | 25.9 | 23.4 | 33.7 | $0.35 | 0.669 | 0.529 | 0.051 | 0.435 | 0.267 | 0.893 | 0.51 |
| 214 | Qwen3 4B (Reasoning) | Alibaba | 25.6 | - | 22.3 | $0.398 | 0.696 | 0.522 | 0.051 | 0.465 | 0.035 | 0.933 | 0.657 |
| 215 | Llama 3.1 Tulu3 405B | Allen Institute for AI | 25.4 | - | - | $0 | 0.716 | 0.516 | 0.035 | 0.291 | 0.302 | 0.778 | 0.133 |
| 216 | Claude 3.5 Sonnet (June '24) | Anthropic | 25.4 | 26 | - | $6 | 0.751 | 0.56 | 0.037 | - | 0.316 | 0.695 | 0.097 |
| 217 | GPT-4o (ChatGPT) | OpenAI | 25.3 | - | - | $7.5 | 0.773 | 0.511 | 0.037 | - | 0.334 | 0.797 | 0.103 |
| 218 | Qwen3 VL 4B Instruct | Alibaba | 25.2 | 14.2 | 37 | $0 | 0.634 | 0.371 | 0.037 | 0.29 | 0.137 | - | - |
| 219 | Nova Pro | Amazon | 25 | 16.6 | 7 | $1.4 | 0.691 | 0.499 | 0.034 | 0.233 | 0.208 | 0.786 | 0.107 |
| 220 | Pixtral Large | Mistral | 25 | - | 2.3 | $3 | 0.701 | 0.505 | 0.036 | 0.261 | 0.292 | 0.714 | 0.07 |
| 221 | Mistral Small 3.1 | Mistral | 24.9 | 18.3 | 3.7 | $0.15 | 0.659 | 0.454 | 0.048 | 0.212 | 0.265 | 0.707 | 0.093 |
| 222 | Grok 2 (Dec '24) | xAI | 24.7 | - | - | $4 | 0.709 | 0.51 | 0.038 | 0.267 | 0.285 | 0.778 | 0.133 |
| 223 | Gemini 1.5 Flash (Sep '24) | 24.4 | - | - | $0 | 0.68 | 0.463 | 0.035 | 0.273 | 0.267 | 0.827 | 0.18 | |
| 224 | GPT-4 Turbo | OpenAI | 24.2 | 21.5 | - | $15 | 0.694 | - | 0.033 | 0.291 | 0.319 | 0.737 | 0.15 |
| 225 | Hermes 4 - Llama-3.1 70B (Non-reasoning) | Nous Research | 23.8 | 18.2 | 11.3 | $0.198 | 0.664 | 0.491 | 0.036 | 0.269 | 0.277 | - | - |
| 226 | Llama 3.1 Nemotron Instruct 70B | NVIDIA | 23.6 | 14.8 | 11 | $0.6 | 0.69 | 0.465 | 0.046 | 0.169 | 0.233 | 0.733 | 0.247 |
| 227 | Grok Beta | xAI | 23 | - | - | $0 | 0.703 | 0.471 | 0.047 | 0.241 | 0.295 | 0.737 | 0.103 |
| 228 | Qwen3 8B (Non-reasoning) | Alibaba | 22.9 | 13 | 24.3 | $0.31 | 0.643 | 0.452 | 0.028 | 0.202 | 0.168 | 0.828 | 0.243 |
| 229 | Qwen2.5 Instruct 32B | Alibaba | 22.9 | - | - | $0 | 0.697 | 0.466 | 0.038 | 0.248 | 0.229 | 0.805 | 0.11 |
| 230 | Phi-4 | Microsoft Azure | 22.7 | 17.6 | 18 | $0.219 | 0.714 | 0.575 | 0.041 | 0.231 | 0.26 | 0.81 | 0.143 |
| 231 | Granite 4.0 H Small | IBM | 22.7 | 16.1 | 13.7 | $0.107 | 0.624 | 0.416 | 0.037 | 0.251 | 0.209 | - | - |
| 232 | Llama 3.1 Instruct 70B | Meta | 22.6 | 17.6 | 4 | $0.56 | 0.676 | 0.409 | 0.046 | 0.232 | 0.267 | 0.649 | 0.173 |
| 233 | Qwen3 1.7B (Reasoning) | Alibaba | 22.4 | 11.7 | 38.7 | $0.398 | 0.57 | 0.356 | 0.048 | 0.308 | 0.043 | 0.894 | 0.51 |
| 234 | Mistral Large 2 (Jul '24) | Mistral | 22.3 | - | 0 | $3 | 0.683 | 0.472 | 0.032 | 0.267 | 0.271 | 0.714 | 0.093 |
| 235 | OLMo 3 7B Instruct | Allen Institute for AI | 22.2 | 12.3 | 41.3 | $0.125 | 0.522 | 0.4 | 0.058 | 0.266 | 0.103 | - | - |
| 236 | Gemma 3 27B Instruct | 22.1 | 12.8 | 20.7 | $0 | 0.669 | 0.428 | 0.047 | 0.137 | 0.212 | 0.883 | 0.253 | |
| 237 | Ministral 3B (Dec '25) | Mistral | 21.8 | 13 | 22 | $0.1 | 0.524 | 0.358 | 0.053 | 0.247 | 0.144 | - | - |
| 238 | Qwen2.5 Coder Instruct 32B | Alibaba | 21.8 | - | - | $0.141 | 0.635 | 0.417 | 0.038 | 0.295 | 0.271 | 0.767 | 0.12 |
| 239 | GPT-4 | OpenAI | 21.5 | 13.1 | - | $37.5 | - | - | - | - | - | - | - |
| 240 | Nova Lite | Amazon | 21.5 | 10.4 | 7 | $0.105 | 0.59 | 0.433 | 0.046 | 0.167 | 0.139 | 0.765 | 0.107 |
| 241 | GPT-4o mini | OpenAI | 21.2 | - | 14.7 | $0.263 | 0.648 | 0.426 | 0.04 | 0.234 | 0.229 | 0.789 | 0.117 |
| 242 | Mistral Small 3 | Mistral | 21.2 | - | 4.3 | $0.15 | 0.652 | 0.462 | 0.041 | 0.252 | 0.236 | 0.715 | 0.08 |
| 243 | Jamba Reasoning 3B | AI21 Labs | 20.9 | 9.2 | 10.7 | $0 | 0.577 | 0.333 | 0.046 | 0.21 | 0.059 | - | - |
| 244 | Jamba 1.7 Large | AI21 Labs | 20.8 | 13 | 2.3 | $3.5 | 0.577 | 0.39 | 0.038 | 0.181 | 0.188 | 0.6 | 0.057 |
| 245 | Qwen3 4B (Non-reasoning) | Alibaba | 20.7 | - | - | $0.188 | 0.586 | 0.398 | 0.037 | 0.233 | 0.167 | 0.843 | 0.213 |
| 246 | DeepSeek-V2.5 (Dec '24) | DeepSeek | 20.7 | - | - | $0 | - | - | - | - | - | 0.763 | - |
| 247 | Claude 3 Opus | Anthropic | 20.6 | 19.5 | - | $30 | 0.696 | 0.489 | 0.031 | 0.279 | 0.233 | 0.641 | 0.033 |
| 248 | Exaone 4.0 1.2B (Non-reasoning) | LG AI Research | 20.5 | 12.2 | 24 | $0 | 0.5 | 0.424 | 0.058 | 0.293 | 0.074 | - | - |
| 249 | Gemma 3 12B Instruct | 20.4 | 10.6 | 18.3 | $0 | 0.595 | 0.349 | 0.048 | 0.137 | 0.174 | 0.853 | 0.22 | |
| 250 | DeepSeek-V2.5 | DeepSeek | 20.2 | - | - | $0 | - | - | - | - | - | - | - |
| 251 | Gemini 2.0 Flash Thinking Experimental (Dec '24) | 20.2 | - | - | $0 | - | - | - | - | - | 0.48 | - | |
| 252 | Claude 3.5 Haiku | Anthropic | 20.2 | - | - | $1.6 | 0.634 | 0.408 | 0.035 | 0.314 | 0.274 | 0.721 | 0.033 |
| 253 | Devstral Small (May '25) | Mistral | 19.6 | - | - | $0.15 | 0.632 | 0.434 | 0.04 | 0.258 | 0.245 | 0.684 | 0.067 |
| 254 | Mistral Saba | Mistral | 19.6 | - | - | $0 | 0.611 | 0.424 | 0.041 | - | 0.241 | 0.677 | 0.13 |
| 255 | DeepSeek R1 Distill Llama 8B | DeepSeek | 19.5 | - | 41.3 | $0 | 0.543 | 0.302 | 0.042 | 0.233 | 0.119 | 0.853 | 0.333 |
| 256 | Gemini 1.5 Pro (May '24) | 19.2 | 19.8 | - | $0 | 0.657 | 0.371 | 0.039 | 0.244 | 0.274 | 0.673 | 0.08 | |
| 257 | R1 1776 | Perplexity | 19.1 | - | - | $0 | - | - | - | - | - | 0.954 | - |
| 258 | Qwen2.5 Turbo | Alibaba | 19.1 | - | - | $0.087 | 0.633 | 0.41 | 0.042 | 0.163 | 0.153 | 0.805 | 0.12 |
| 259 | Reka Flash (Sep '24) | Reka AI | 19.1 | - | - | $0.35 | - | - | - | - | - | 0.529 | - |
| 260 | Solar Mini | Upstage | 18.9 | - | - | $0.15 | - | - | - | - | - | 0.331 | - |
| 261 | Llama 3.2 Instruct 90B (Vision) | Meta | 18.9 | - | - | $0.72 | 0.671 | 0.432 | 0.049 | 0.214 | 0.24 | 0.629 | 0.05 |
| 262 | Grok-1 | xAI | 18.2 | - | - | $0 | - | - | - | - | - | - | - |
| 263 | Qwen2 Instruct 72B | Alibaba | 18.1 | - | - | $0 | 0.622 | 0.371 | 0.037 | 0.159 | 0.229 | 0.701 | 0.147 |
| 264 | Nova Micro | Amazon | 17.7 | 8.3 | 6 | $0.061 | 0.531 | 0.358 | 0.047 | 0.14 | 0.094 | 0.703 | 0.08 |
| 265 | LFM2 8B A1B | Liquid AI | 17.4 | 7.3 | 25.3 | $0 | 0.505 | 0.344 | 0.049 | 0.151 | 0.068 | - | - |
| 266 | Llama 3.1 Instruct 8B | Meta | 16.9 | 8.5 | 4.3 | $0.1 | 0.476 | 0.259 | 0.051 | 0.116 | 0.132 | 0.519 | 0.077 |
| 267 | Gemini 1.5 Flash-8B | 16.3 | - | - | $0 | 0.569 | 0.359 | 0.045 | 0.217 | 0.229 | 0.689 | 0.033 | |
| 268 | Granite 4.0 Micro | IBM | 16.2 | 10.4 | 6 | $0 | 0.447 | 0.336 | 0.051 | 0.18 | 0.119 | - | - |
| 269 | Phi-4 Mini Instruct | Microsoft Azure | 15.7 | 7.8 | 6.7 | $0 | 0.465 | 0.331 | 0.042 | 0.126 | 0.108 | 0.696 | 0.03 |
| 270 | Gemma 3n E4B Instruct | 15.5 | 8.3 | 14.3 | $0.025 | 0.488 | 0.296 | 0.044 | 0.146 | 0.081 | 0.771 | 0.137 | |
| 271 | Llama 3.2 Instruct 11B (Vision) | Meta | 15.5 | 7.7 | 1.7 | $0.16 | 0.464 | 0.221 | 0.052 | 0.11 | 0.112 | 0.516 | 0.093 |
| 272 | DeepHermes 3 - Mistral 24B Preview (Non-reasoning) | Nous Research | 15.5 | - | - | $0 | 0.58 | 0.382 | 0.039 | 0.195 | 0.228 | 0.595 | 0.047 |
| 273 | Granite 3.3 8B (Non-reasoning) | IBM | 15.2 | 7.6 | 6.7 | $0.085 | 0.468 | 0.338 | 0.042 | 0.127 | 0.101 | 0.665 | 0.047 |
| 274 | Jamba 1.5 Large | AI21 Labs | 14.8 | - | - | $3.5 | 0.572 | 0.427 | 0.04 | 0.143 | 0.163 | 0.606 | 0.047 |
| 275 | Jamba 1.7 Mini | AI21 Labs | 14.8 | 5.1 | 0.3 | $0.25 | 0.388 | 0.322 | 0.045 | 0.061 | 0.093 | 0.258 | 0.013 |
| 276 | Hermes 3 - Llama-3.1 70B | Nous Research | 14.7 | - | - | $0.3 | 0.571 | 0.401 | 0.041 | 0.188 | 0.231 | 0.538 | 0.023 |
| 277 | Gemma 3 4B Instruct | 14.7 | 6.4 | 12.7 | $0 | 0.417 | 0.291 | 0.052 | 0.112 | 0.073 | 0.766 | 0.063 | |
| 278 | DeepSeek-Coder-V2 | DeepSeek | 14.5 | - | - | $0 | - | - | - | - | - | 0.743 | - |
| 279 | Phi-3 Medium Instruct 14B | Microsoft Azure | 14.4 | 8.9 | 1.3 | $0.297 | 0.543 | 0.326 | 0.045 | 0.15 | 0.118 | 0.463 | 0.013 |
| 280 | OLMo 2 32B | Allen Institute for AI | 14.4 | 4.9 | 3.3 | $0 | 0.511 | 0.328 | 0.037 | 0.068 | 0.08 | - | - |
| 281 | Qwen3 1.7B (Non-reasoning) | Alibaba | 14.4 | 6.5 | 7.3 | $0.188 | 0.411 | 0.283 | 0.052 | 0.126 | 0.069 | 0.717 | 0.097 |
| 282 | Jamba 1.6 Large | AI21 Labs | 14.3 | - | - | $3.5 | 0.565 | 0.387 | 0.04 | 0.172 | 0.184 | 0.58 | 0.047 |
| 283 | Qwen3 0.6B (Reasoning) | Alibaba | 14.2 | 5 | 18 | $0.398 | 0.347 | 0.239 | 0.057 | 0.121 | 0.028 | 0.75 | 0.1 |
| 284 | Gemini 1.5 Flash (May '24) | 14 | - | - | $0 | 0.574 | 0.324 | 0.042 | 0.196 | 0.181 | 0.554 | 0.093 | |
| 285 | Granite 4.0 H 1B | IBM | 13.7 | 6.6 | 6.3 | $0 | 0.277 | 0.263 | 0.05 | 0.115 | 0.082 | - | - |
| 286 | Granite 4.0 1B | IBM | 13.3 | 4.5 | 6.3 | $0 | 0.325 | 0.281 | 0.051 | 0.047 | 0.087 | - | - |
| 287 | Claude 3 Sonnet | Anthropic | 13.3 | - | - | $6 | 0.579 | 0.4 | 0.038 | 0.175 | 0.229 | 0.414 | 0.047 |
| 288 | Llama 3 Instruct 70B | Meta | 13 | - | - | $0.88 | 0.574 | 0.379 | 0.044 | 0.198 | 0.189 | 0.483 | 0 |
| 289 | Mistral Small (Sep '24) | Mistral | 13 | - | - | $0.3 | 0.529 | 0.381 | 0.043 | 0.141 | 0.156 | 0.563 | 0.063 |
| 290 | Gemini 1.0 Ultra | 12.8 | 17.6 | - | $0 | - | - | - | - | - | - | - | |
| 291 | Phi-3 Mini Instruct 3.8B | Microsoft Azure | 12.7 | 6.9 | 0.3 | $0.228 | 0.435 | 0.319 | 0.044 | 0.116 | 0.09 | 0.457 | 0.04 |
| 292 | Gemma 3n E4B Instruct Preview (May '25) | 12.5 | - | - | $0 | 0.483 | 0.278 | 0.049 | 0.138 | 0.086 | 0.749 | 0.107 | |
| 293 | Phi-4 Multimodal Instruct | Microsoft Azure | 12.4 | - | - | $0 | 0.485 | 0.315 | 0.044 | 0.131 | 0.11 | 0.693 | 0.093 |
| 294 | Qwen2.5 Coder Instruct 7B | Alibaba | 12.2 | - | - | $0 | 0.473 | 0.339 | 0.048 | 0.126 | 0.148 | 0.66 | 0.053 |
| 295 | Mistral Large (Feb '24) | Mistral | 11.9 | - | - | $6 | 0.515 | 0.351 | 0.034 | 0.178 | 0.208 | 0.527 | 0 |
| 296 | LFM2 2.6B | Liquid AI | 11.8 | 3.8 | 8.3 | $0 | 0.298 | 0.306 | 0.052 | 0.081 | 0.025 | - | - |
| 297 | Mixtral 8x22B Instruct | Mistral | 11.7 | - | - | $0 | 0.537 | 0.332 | 0.041 | 0.148 | 0.188 | 0.545 | 0 |
| 298 | Gemma 3n E2B Instruct | 11.3 | 5.2 | 10.3 | $0 | 0.378 | 0.229 | 0.04 | 0.095 | 0.052 | 0.691 | 0.09 | |
| 299 | Llama 2 Chat 7B | Meta | 11.3 | - | - | $0.1 | 0.164 | 0.227 | 0.058 | 0.002 | 0 | 0.059 | 0 |
| 300 | Llama 3.2 Instruct 3B | Meta | 11.2 | - | 3.3 | $0.06 | 0.347 | 0.255 | 0.052 | 0.083 | 0.052 | 0.489 | 0.067 |
| 301 | Qwen3 0.6B (Non-reasoning) | Alibaba | 11 | 3.8 | 10.3 | $0.188 | 0.231 | 0.231 | 0.052 | 0.073 | 0.041 | 0.521 | 0.017 |
| 302 | Qwen1.5 Chat 110B | Alibaba | 10.5 | - | - | $0 | - | 0.289 | - | - | - | - | - |
| 303 | LFM2 1.2B | Liquid AI | 9.7 | 1.5 | 3.3 | $0 | 0.257 | 0.228 | 0.057 | 0.02 | 0.025 | - | - |
| 304 | Claude 2.1 | Anthropic | 9.7 | 14 | - | $0 | 0.495 | 0.319 | 0.042 | 0.195 | 0.184 | 0.374 | 0.033 |
| 305 | Claude 3 Haiku | Anthropic | 9.6 | - | - | $0.5 | - | - | - | 0.154 | 0.186 | 0.394 | 0.01 |
| 306 | OLMo 2 7B | Allen Institute for AI | 9.5 | 2.6 | 0.7 | $0 | 0.282 | 0.288 | 0.055 | 0.041 | 0.037 | - | - |
| 307 | Molmo 7B-D | Allen Institute for AI | 9.3 | 2.5 | 0 | $0 | 0.371 | 0.24 | 0.051 | 0.039 | 0.036 | - | - |
| 308 | Llama 3.2 Instruct 1B | Meta | 8.9 | 1.2 | 0 | $0.053 | 0.2 | 0.196 | 0.053 | 0.019 | 0.017 | 0.14 | 0 |
| 309 | DeepSeek-V2-Chat | DeepSeek | 8.6 | - | - | $0 | - | - | - | - | - | - | - |
| 310 | DeepSeek R1 Distill Qwen 1.5B | DeepSeek | 8.6 | - | 22 | $0 | 0.269 | 0.098 | 0.033 | 0.07 | 0.066 | 0.687 | 0.177 |
| 311 | Claude 2.0 | Anthropic | 8.6 | 12.9 | - | $0 | 0.486 | 0.344 | - | 0.171 | 0.194 | - | 0 |
| 312 | Mistral Small (Feb '24) | Mistral | 8.5 | - | - | $1.5 | 0.419 | 0.302 | 0.044 | 0.111 | 0.134 | 0.562 | 0.007 |
| 313 | Mistral Medium | Mistral | 8.4 | - | - | $4.088 | 0.491 | 0.349 | 0.034 | 0.099 | 0.118 | 0.405 | 0.037 |
| 314 | GPT-3.5 Turbo | OpenAI | 8.3 | 10.7 | - | $0.75 | 0.462 | 0.297 | - | - | - | 0.441 | - |
| 315 | Granite 4.0 H 350M | IBM | 8.2 | 1.2 | 1.3 | $0 | 0.127 | 0.257 | 0.064 | 0.019 | 0.017 | - | - |
| 316 | Granite 4.0 350M | IBM | 7.7 | 1.1 | 0 | $0 | 0.124 | 0.261 | 0.057 | 0.024 | 0.009 | - | - |
| 317 | Arctic Instruct | Snowflake | 7.6 | - | - | $0 | - | - | - | - | - | - | - |
| 318 | Qwen Chat 72B | Alibaba | 7.6 | - | - | $0 | - | - | - | - | - | - | - |
| 319 | LFM 40B | Liquid AI | 7.3 | - | - | $0 | 0.425 | 0.327 | 0.049 | 0.096 | 0.071 | 0.48 | 0.023 |
| 320 | Llama 3 Instruct 8B | Meta | 7 | - | - | $0.07 | 0.405 | 0.296 | 0.051 | 0.096 | 0.119 | 0.499 | 0 |
| 321 | Gemma 3 1B Instruct | 6.8 | 0.8 | 3.3 | $0 | 0.135 | 0.237 | 0.052 | 0.017 | 0.007 | 0.484 | 0 | |
| 322 | PALM-2 | 6.6 | 4.6 | - | $0 | - | - | - | - | - | - | - | |
| 323 | Gemini 1.0 Pro | 6.2 | - | - | $0 | 0.431 | 0.277 | 0.046 | 0.116 | 0.117 | 0.403 | 0.007 | |
| 324 | DeepSeek Coder V2 Lite Instruct | DeepSeek | 6.1 | - | - | $0 | 0.429 | 0.319 | 0.053 | 0.158 | 0.139 | - | - |
| 325 | Gemma 3 270M | 5.6 | 0.1 | 2.3 | $0 | 0.055 | 0.224 | 0.042 | 0.003 | 0 | - | - | |
| 326 | DeepSeek LLM 67B Chat (V1) | DeepSeek | 5.6 | - | - | $0 | - | - | - | - | - | - | - |
| 327 | Llama 2 Chat 70B | Meta | 5.6 | - | - | $0 | 0.406 | 0.327 | 0.05 | 0.098 | - | 0.323 | 0 |
| 328 | Command-R+ (Apr '24) | Cohere | 5.5 | - | - | $6 | 0.432 | 0.323 | 0.045 | 0.122 | 0.118 | 0.279 | 0.007 |
| 329 | Llama 2 Chat 13B | Meta | 5.5 | - | - | $0 | 0.406 | 0.321 | 0.047 | 0.098 | 0.118 | 0.329 | 0.017 |
| 330 | OpenChat 3.5 (1210) | OpenChat | 5.4 | - | - | $0 | 0.31 | 0.23 | 0.048 | 0.115 | - | 0.307 | 0 |
| 331 | DBRX Instruct | Databricks | 5.3 | - | - | $0 | 0.397 | 0.331 | 0.066 | 0.093 | 0.118 | 0.279 | 0.03 |
| 332 | Jamba 1.5 Mini | AI21 Labs | 4 | - | - | $0.25 | 0.371 | 0.302 | 0.051 | 0.062 | 0.08 | 0.357 | 0.01 |
| 333 | Jamba 1.6 Mini | AI21 Labs | 3.3 | - | - | $0.25 | 0.367 | 0.3 | 0.046 | 0.071 | 0.101 | 0.257 | 0.033 |
| 334 | Mixtral 8x7B Instruct | Mistral | 2.6 | - | - | $0.54 | 0.387 | 0.292 | 0.045 | 0.066 | 0.028 | 0.299 | 0 |
| 335 | DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) | Nous Research | 1.8 | - | - | $0 | 0.365 | 0.27 | 0.043 | 0.085 | 0.091 | 0.218 | 0 |
| 336 | Llama 65B | Meta | 1 | - | - | $0 | - | - | - | - | - | - | - |
| 337 | Claude Instant | Anthropic | 1 | 7.8 | - | $0 | 0.434 | 0.33 | 0.038 | 0.109 | - | 0.264 | 0 |
| 338 | Mistral 7B Instruct | Mistral | 1 | - | - | $0.25 | 0.245 | 0.177 | 0.043 | 0.046 | 0.024 | 0.121 | 0 |
| 339 | Command-R (Mar '24) | Cohere | 1 | - | - | $0.75 | 0.338 | 0.284 | 0.048 | 0.048 | 0.062 | 0.164 | 0.007 |
| 340 | Qwen Chat 14B | Alibaba | 1 | - | - | $0 | - | - | - | - | - | - | - |
| 341 | GPT-4o Realtime (Dec '24) | OpenAI | - | - | - | $0 | - | - | - | - | - | - | - |
| 342 | GPT-3.5 Turbo (0613) | OpenAI | - | - | - | $0 | - | - | - | - | - | - | - |
| 343 | Cogito v2.1 (Reasoning) | Deep Cogito | - | 41.8 | 72.7 | $1.25 | 0.849 | 0.768 | 0.11 | 0.688 | 0.41 | - | - |
| 344 | GPT-4o mini Realtime (Dec '24) | OpenAI | - | - | - | $0 | - | - | - | - | - | - | - |
| 345 | DeepSeek-OCR | DeepSeek | - | - | - | $0.048 | - | - | - | - | - | - | - |
* 价格为每百万 Token 的混合价格 (3:1 输入/输出)
Artificial Analysis AI 大模型排名 介绍
Artificial Analysis 是一家独立的 AI 基准测试和分析公司,提供独立的基准测试和分析,以支持开发者、研究人员、企业和其他 AI 用户。Artificial Analysis同时测试专有与开放权重模型,并以端到端用户体验为核心,测量实际使用中的响应时间、输出速度及成本。
质量基准涵盖语言理解与推理能力;性能基准则关注首次令牌到达时间、输出速度、端到端响应时间等真实可感知指标。我们区分 OpenAI Tokens 与原生 Tokens,以便在不同模型之间进行统一、公平的对比,并使用按 3:1 的输入/输出比计算混合价。基准对象包括模型、端点、系统与提供商,覆盖语言模型、语音、图像生成等多个方向,旨在帮助用户准确了解不同 AI 服务的真实表现与性价比。
Artificial Analysis AI 测试基准介绍
上下文窗口
输入和输出令牌的最大总数。输出令牌的数量限制通常要低得多(具体数量因模型而异)。
输出速度
模型生成令牌时每秒接收到的令牌数(即,对于支持流式传输的模型,在从 API 接收到第一个数据块之后)。
延迟(首次令牌到达时间)
API 请求发送后,收到第一个推理令牌所需的时间(以秒为单位)。对于共享推理令牌的推理模型,这将是第一个推理令牌。对于不支持流式传输的模型,这表示收到完成状态所需的时间。
价格
每个代币的价格,以美元/百万代币表示。价格是输入代币和输出代币价格的混合(比例为 3:1)。
常见 AI 大模型测试基准介绍
MMLU Pro
Massive Multitask Language Understanding Professional。MMLU 的增强版,旨在评估大语言模型的推理能力。它通过过滤简单问题、增加选项数量(从4个增加到10个)以及强调复杂的多步推理,来解决原版 MMLU 的局限性。涵盖 14 个领域的约 12,000 个问题。
GPQA
Graduate-Level Google-Proof Q&A Benchmark。一个具有挑战性的研究生级别问答基准,旨在评估 AI 系统在物理、化学和生物等复杂科学领域提供真实信息的能力。这些问题被设计为“防谷歌搜索”,即需要深度理解和推理,而不仅仅是简单的事实回忆。
HLE
Humanity's Last Exam。一个全面的评估框架,旨在测试 AI 系统在模仿人类水平推理、解决问题和知识整合方面的能力。包含 100 多个学科的 2,500 到 3,000 个专家级问题,强调多步推理和处理新颖场景的能力。
LiveCodeBench
一个无污染的 LLM 代码能力评估基准。它持续从 LeetCode、AtCoder 和 Codeforces 等平台的竞赛中收集新问题,以防止训练集数据污染。除了代码生成,还评估自我修复、代码执行和测试输出预测等能力。
SciCode
评估语言模型解决现实科学研究问题代码生成能力的基准。涵盖物理、数学、材料科学、生物和化学等 6 个领域的 16 个子领域。问题源自真实的科学工作流,通常需要知识回忆、推理和代码合成。
Math 500
旨在评估语言模型数学推理和解决问题能力的基准。包含 500 个来自 AMC 和 AIME 等高水平高中数学竞赛的难题,涵盖代数、组合数学、几何、数论和预微积分等领域。
AIME
American Invitational Mathematics Examination。基于美国数学邀请赛问题的基准,被认为是测试高级数学推理的最具挑战性的 AI 测试之一。包含 30 个“奥林匹克级别”的整数答案数学问题,测试多步推理、抽象和解决问题的能力。