This page uses opencode/gpt-5.4 as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.
These charts use opencode/gpt-5.4 as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.
Use this to decide whether another model beats opencode/gpt-5.4 enough to justify the change.
| Model | Composite | Delta vs baseline | Success | Success delta | ORPT | ORPT delta | Cost | Cost delta | Wall time |
|---|---|---|---|---|---|---|---|---|---|
| opencode/gpt-5.4-nano | 0.789 | +0.181 | 85% | +7% | 15.17 | +4.17 | $0.4215 | -$8.5611 | 27m 33s |
| opencode/kimi-k2.5 | 0.785 | +0.176 | 89% | +11% | 14.25 | +3.25 | $0.9122 | -$8.0705 | 41m 05s |
| opencode/claude-opus-4-6 | 0.67 | +0.062 | 89% | +11% | 14.88 | +3.88 | $21.8757 | +$12.8930 | 40m 04s |
| opencode/glm-5 | 0.623 | +0.014 | 78% | +0% | 11.57 | +0.57 | $6.4339 | -$2.5488 | 20m 10s |
| opencode/big-pickle | 0.615 | +0.006 | 67% | -11% | 15.39 | +4.39 | $0.0000 | -$8.9827 | 36m 28s |
| opencode/gpt-5.4 Baseline | 0.609 | +0.0 | 78% | +0% | 11.00 | +0.00 | $8.9827 | +$0.0000 | 32m 47s |
| opencode/claude-sonnet-4-6 | 0.593 | -0.016 | 78% | +0% | 16.43 | +5.43 | $11.8406 | +$2.8579 | 42m 31s |
| opencode/glm-5.1 | 0.547 | -0.062 | 67% | -11% | 12.06 | +1.06 | $1.8816 | -$7.1011 | 64m 39s |
| opencode/minimax-m2.5 | 0.481 | -0.128 | 56% | -22% | 18.87 | +7.87 | $0.6413 | -$8.3414 | 32m 15s |
| opencode/gpt-5.4-mini | 0.425 | -0.184 | 48% | -30% | 9.54 | -1.46 | $1.0606 | -$7.9221 | 21m 48s |
| opencode/minimax-m2.5-free | 0.415 | -0.194 | 59% | -19% | 16.19 | +5.19 | $0.0000 | -$8.9827 | 41m 34s |
| opencode/gemini-3-flash | 0.415 | -0.194 | 59% | -19% | 21.81 | +10.81 | $2.4307 | -$6.5520 | 62m 52s |
| opencode/gemini-3.1-pro | 0.291 | -0.318 | 37% | -41% | 12.70 | +1.70 | $5.8536 | -$3.1291 | 51m 25s |
| opencode/nemotron-3-super-free | 0.181 | -0.427 | 26% | -52% | 19.43 | +8.43 | $0.0000 | -$8.9827 | 109m 00s |
This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.
| Task | Field read | Baseline result | Winner | Gap to winner | Baseline cost | Baseline time |
|---|---|---|---|---|---|---|
| RHEL k3s node preparation repair | Competitive split | failed | opencode/gpt-5.4-nano 1.0 |
1.0 | $0.3012 | 51s |
| Bootstrap phase validation repair | Competitive split | failed | opencode/kimi-k2.5 0.993 |
0.993 | $0.4329 | 1m 27s |
| nftables router ingress repair | Competitive split | failed | opencode/gpt-5.4-nano 0.98 |
0.98 | $0.3503 | 1m 13s |
| Docker Compose observability fix | Competitive split | failed | opencode/gpt-5.4-nano 0.975 |
0.975 | $0.2825 | 46s |
| Pre-ArgoCD bootstrap sequencing | Competitive split | failed | opencode/gpt-5.4-nano 0.967 |
0.967 | $0.4099 | 1m 21s |
| RHEL edge firewalld router repair | Competitive split | failed | opencode/gpt-5.4-nano 0.953 |
0.953 | $0.2606 | 41s |
| K3s registry mirror trust repair | Competitive split | passed | opencode/big-pickle 1.0 |
0.242 | $0.2694 | 1m 16s |
| Workspace transplant bundle repair | Competitive split | passed | opencode/big-pickle 0.985 |
0.237 | $0.4056 | 2m 07s |
| Event status shell summary | Competitive split | passed | opencode/big-pickle 1.0 |
0.217 | $0.2351 | 33s |
| Kubernetes rollout repair | Clear separation | passed | opencode/gpt-5.4-mini 1.0 |
0.209 | $0.3143 | 59s |
| Terraform static site repair | Competitive split | passed | opencode/kimi-k2.5 0.978 |
0.203 | $0.2631 | 43s |
| RHEL NetworkManager bridge VLAN repair | Competitive split | passed | opencode/gpt-5.4-nano 0.951 |
0.2 | $0.3071 | 1m 02s |
| SELinux registry volume label repair | Clear separation | passed | opencode/kimi-k2.5 1.0 |
0.198 | $0.3297 | 1m 09s |
| CNPG restore manifest repair | Competitive split | passed | opencode/big-pickle 0.964 |
0.191 | $0.3379 | 1m 12s |
| Log level rollup shell script | Competitive split | passed | opencode/big-pickle 0.965 |
0.173 | $0.2457 | 48s |
| Build workspace plane convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.942 |
0.172 | $0.4417 | 1m 14s |
| MCP OpenBao contract repair | Competitive split | passed | opencode/big-pickle 0.954 |
0.172 | $0.3493 | 1m 11s |
| Ansible nginx role completion | Competitive split | passed | opencode/big-pickle 0.963 |
0.171 | $0.2740 | 41s |
| Log audit shell script | Competitive split | passed | opencode/gpt-5.4-nano 0.935 |
0.167 | $0.2761 | 1m 06s |
| ExternalDNS RFC2136 repair | Competitive split | passed | opencode/kimi-k2.5 0.982 |
0.167 | $0.3297 | 1m 02s |
| GitOps workspace render validation | Competitive split | passed | opencode/big-pickle 0.941 |
0.161 | $0.3886 | 1m 20s |
| Workspace runtime access convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.932 |
0.158 | $0.5689 | 3m 35s |
| Kubernetes OIDC RBAC repair | Competitive split | passed | opencode/gpt-5.4-nano 0.95 |
0.146 | $0.3245 | 1m 05s |
| Wildcard TLS route coverage | Competitive split | passed | opencode/kimi-k2.5 0.929 |
0.133 | $0.3123 | 47s |
| Traefik forwarded header trust repair | Competitive split | passed | opencode/kimi-k2.5 0.913 |
0.13 | $0.3067 | 2m 18s |
| AppArmor dnsmasq profile repair | Competitive split | passed | opencode/gpt-5.4-nano 0.918 |
0.126 | $0.3399 | 1m 29s |
| MetalLB ingress address pool repair | Competitive split | passed | opencode/gpt-5.4-nano 0.928 |
0.125 | $0.3259 | 53s |
Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.
| Challenger | Task record | Composite edge | Success edge | Cost edge | Time edge | ORPT edge |
|---|---|---|---|---|---|---|
| opencode/nemotron-3-super-free | 21-0 6 ties |
+0.427 | +52% | +$8.9827 | -76m 13s | -8.43 |
| opencode/gemini-3.1-pro | 17-5 5 ties |
+0.318 | +41% | +$3.1291 | -18m 38s | -1.70 |
| opencode/minimax-m2.5-free | 21-1 5 ties |
+0.194 | +19% | +$8.9827 | -8m 46s | -5.19 |
| opencode/gemini-3-flash | 21-2 4 ties |
+0.194 | +19% | +$6.5520 | -30m 04s | -10.81 |
| opencode/gpt-5.4-mini | 8-13 6 ties |
+0.184 | +30% | +$7.9221 | +10m 59s | +1.46 |
| opencode/gpt-5.4-nano | 3-23 1 ties |
-0.181 | -7% | +$8.5611 | +5m 15s | -4.17 |
| opencode/kimi-k2.5 | 2-24 1 ties |
-0.176 | -11% | +$8.0705 | -8m 17s | -3.25 |
| opencode/minimax-m2.5 | 11-13 3 ties |
+0.128 | +22% | +$8.3414 | +32s | -7.87 |
| opencode/glm-5.1 | 6-17 4 ties |
+0.062 | +11% | +$7.1011 | -31m 52s | -1.06 |
| opencode/claude-opus-4-6 | 21-5 1 ties |
-0.062 | -11% | -$12.8930 | -7m 17s | -3.88 |
| opencode/claude-sonnet-4-6 | 19-7 1 ties |
+0.016 | +0% | -$2.8579 | -9m 43s | -5.43 |
| opencode/glm-5 | 7-16 4 ties |
-0.014 | +0% | +$2.5488 | +12m 37s | -0.57 |
| opencode/big-pickle | 6-18 3 ties |
-0.006 | +11% | +$8.9827 | -3m 41s | -4.39 |
The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.
OpenRouter reference blend for openai/gpt-5.4 is 5.625 USD per 1M tokens using a 3:1 input:output mix.