This page uses opencode/gemini-3.1-pro as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.
These charts use opencode/gemini-3.1-pro as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.
Use this to decide whether another model beats opencode/gemini-3.1-pro enough to justify the change.
| Model | Composite | Delta vs baseline | Success | Success delta | ORPT | ORPT delta | Cost | Cost delta | Wall time |
|---|---|---|---|---|---|---|---|---|---|
| opencode/gpt-5.4-nano | 0.789 | +0.498 | 85% | +48% | 15.17 | +2.47 | $0.4215 | -$5.4321 | 27m 33s |
| opencode/kimi-k2.5 | 0.785 | +0.494 | 89% | +52% | 14.25 | +1.55 | $0.9122 | -$4.9414 | 41m 05s |
| opencode/claude-opus-4-6 | 0.67 | +0.379 | 89% | +52% | 14.88 | +2.18 | $21.8757 | +$16.0221 | 40m 04s |
| opencode/glm-5 | 0.623 | +0.332 | 78% | +41% | 11.57 | -1.13 | $6.4339 | +$0.5803 | 20m 10s |
| opencode/big-pickle | 0.615 | +0.324 | 67% | +30% | 15.39 | +2.69 | $0.0000 | -$5.8536 | 36m 28s |
| opencode/gpt-5.4 | 0.609 | +0.318 | 78% | +41% | 11.00 | -1.70 | $8.9827 | +$3.1291 | 32m 47s |
| opencode/claude-sonnet-4-6 | 0.593 | +0.302 | 78% | +41% | 16.43 | +3.73 | $11.8406 | +$5.9869 | 42m 31s |
| opencode/glm-5.1 | 0.547 | +0.256 | 67% | +30% | 12.06 | -0.64 | $1.8816 | -$3.9720 | 64m 39s |
| opencode/minimax-m2.5 | 0.481 | +0.19 | 56% | +19% | 18.87 | +6.17 | $0.6413 | -$5.2123 | 32m 15s |
| opencode/gpt-5.4-mini | 0.425 | +0.134 | 48% | +11% | 9.54 | -3.16 | $1.0606 | -$4.7930 | 21m 48s |
| opencode/minimax-m2.5-free | 0.415 | +0.124 | 59% | +22% | 16.19 | +3.49 | $0.0000 | -$5.8536 | 41m 34s |
| opencode/gemini-3-flash | 0.415 | +0.124 | 59% | +22% | 21.81 | +9.11 | $2.4307 | -$3.4229 | 62m 52s |
| opencode/gemini-3.1-pro Baseline | 0.291 | +0.0 | 37% | +0% | 12.70 | +0.00 | $5.8536 | +$0.0000 | 51m 25s |
| opencode/nemotron-3-super-free | 0.181 | -0.109 | 26% | -11% | 19.43 | +6.73 | $0.0000 | -$5.8536 | 109m 00s |
This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.
| Task | Field read | Baseline result | Winner | Gap to winner | Baseline cost | Baseline time |
|---|---|---|---|---|---|---|
| SELinux registry volume label repair | Clear separation | failed | opencode/kimi-k2.5 1.0 |
1.0 | $0.0385 | 19s |
| RHEL k3s node preparation repair | Competitive split | failed | opencode/gpt-5.4-nano 1.0 |
1.0 | $0.1624 | 1m 44s |
| Event status shell summary | Competitive split | dnf | opencode/big-pickle 1.0 |
1.0 | n/a | 45s |
| Kubernetes rollout repair | Clear separation | failed | opencode/gpt-5.4-mini 1.0 |
1.0 | $0.1657 | 1m 35s |
| Bootstrap phase validation repair | Competitive split | failed | opencode/kimi-k2.5 0.993 |
0.993 | $0.3025 | 2m 19s |
| ExternalDNS RFC2136 repair | Competitive split | failed | opencode/kimi-k2.5 0.982 |
0.982 | $0.1398 | 1m 02s |
| nftables router ingress repair | Competitive split | failed | opencode/gpt-5.4-nano 0.98 |
0.98 | $0.2268 | 1m 50s |
| Pre-ArgoCD bootstrap sequencing | Competitive split | failed | opencode/gpt-5.4-nano 0.967 |
0.967 | $0.3456 | 2m 43s |
| Log level rollup shell script | Competitive split | dnf | opencode/big-pickle 0.965 |
0.965 | n/a | 1m 00s |
| RHEL edge firewalld router repair | Competitive split | failed | opencode/gpt-5.4-nano 0.953 |
0.953 | $0.1203 | 1m 09s |
| RHEL NetworkManager bridge VLAN repair | Competitive split | failed | opencode/gpt-5.4-nano 0.951 |
0.951 | $0.1941 | 1m 33s |
| Log audit shell script | Competitive split | dnf | opencode/gpt-5.4-nano 0.935 |
0.935 | n/a | 1m 15s |
| Workspace runtime access convergence | Competitive split | failed | opencode/gpt-5.4-nano 0.932 |
0.932 | $0.5298 | 3m 12s |
| Wildcard TLS route coverage | Competitive split | failed | opencode/kimi-k2.5 0.929 |
0.929 | $0.1606 | 1m 42s |
| MetalLB ingress address pool repair | Competitive split | failed | opencode/gpt-5.4-nano 0.928 |
0.928 | $0.0619 | 34s |
| AppArmor dnsmasq profile repair | Competitive split | failed | opencode/gpt-5.4-nano 0.918 |
0.918 | $0.2495 | 2m 08s |
| Traefik forwarded header trust repair | Competitive split | failed | opencode/kimi-k2.5 0.913 |
0.913 | $0.4734 | 4m 42s |
| K3s registry mirror trust repair | Competitive split | passed | opencode/big-pickle 1.0 |
0.227 | $0.1489 | 52s |
| MCP OpenBao contract repair | Competitive split | passed | opencode/big-pickle 0.954 |
0.209 | $0.4583 | 4m 54s |
| Build workspace plane convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.942 |
0.208 | $0.6600 | 3m 55s |
| CNPG restore manifest repair | Competitive split | passed | opencode/big-pickle 0.964 |
0.199 | $0.2791 | 1m 33s |
| Workspace transplant bundle repair | Competitive split | passed | opencode/big-pickle 0.985 |
0.185 | $0.1576 | 1m 00s |
| Ansible nginx role completion | Competitive split | passed | opencode/big-pickle 0.963 |
0.185 | $0.1680 | 1m 34s |
| Docker Compose observability fix | Competitive split | passed | opencode/gpt-5.4-nano 0.975 |
0.163 | $0.3295 | 3m 57s |
| Kubernetes OIDC RBAC repair | Competitive split | passed | opencode/gpt-5.4-nano 0.95 |
0.147 | $0.2299 | 1m 29s |
| Terraform static site repair | Competitive split | passed | opencode/kimi-k2.5 0.978 |
0.146 | $0.0872 | 1m 20s |
| GitOps workspace render validation | Competitive split | passed | opencode/big-pickle 0.941 |
0.133 | $0.1641 | 1m 21s |
Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.
| Challenger | Task record | Composite edge | Success edge | Cost edge | Time edge | ORPT edge |
|---|---|---|---|---|---|---|
| opencode/gpt-5.4-nano | 1-22 4 ties |
-0.498 | -48% | +$5.4321 | +23m 52s | -2.47 |
| opencode/kimi-k2.5 | 1-24 2 ties |
-0.494 | -52% | +$4.9414 | +10m 20s | -1.55 |
| opencode/claude-opus-4-6 | 8-17 2 ties |
-0.379 | -52% | -$16.0221 | +11m 21s | -2.18 |
| opencode/glm-5 | 4-19 4 ties |
-0.332 | -41% | -$0.5803 | +31m 15s | +1.13 |
| opencode/big-pickle | 3-17 7 ties |
-0.324 | -30% | +$5.8536 | +14m 57s | -2.69 |
| opencode/gpt-5.4 | 5-17 5 ties |
-0.318 | -41% | -$3.1291 | +18m 38s | +1.70 |
| opencode/claude-sonnet-4-6 | 8-15 4 ties |
-0.302 | -41% | -$5.9869 | +8m 54s | -3.73 |
| opencode/glm-5.1 | 4-16 7 ties |
-0.256 | -30% | +$3.9720 | -13m 14s | +0.64 |
| opencode/minimax-m2.5 | 3-15 9 ties |
-0.19 | -19% | +$5.2123 | +19m 10s | -6.17 |
| opencode/gpt-5.4-mini | 3-12 12 ties |
-0.134 | -11% | +$4.7930 | +29m 37s | +3.16 |
| opencode/minimax-m2.5-free | 10-7 10 ties |
-0.124 | -22% | +$5.8536 | +9m 51s | -3.49 |
| opencode/gemini-3-flash | 10-9 8 ties |
-0.124 | -22% | +$3.4229 | -11m 27s | -9.11 |
| opencode/nemotron-3-super-free | 10-1 16 ties |
+0.109 | +11% | +$5.8536 | -57m 35s | -6.73 |
The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.
OpenRouter reference blend for google/gemini-3.1-pro-preview is 4.5 USD per 1M tokens using a 3:1 input:output mix.