This page uses opencode/gpt-5.4-nano as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.
These charts use opencode/gpt-5.4-nano as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.
Use this to decide whether another model beats opencode/gpt-5.4-nano enough to justify the change.
| Model | Composite | Delta vs baseline | Success | Success delta | ORPT | ORPT delta | Cost | Cost delta | Wall time |
|---|---|---|---|---|---|---|---|---|---|
| opencode/gpt-5.4-nano Baseline | 0.789 | +0.0 | 85% | +0% | 15.17 | +0.00 | $0.4215 | +$0.0000 | 27m 33s |
| opencode/kimi-k2.5 | 0.785 | -0.005 | 89% | +4% | 14.25 | -0.92 | $0.9122 | +$0.4907 | 41m 05s |
| opencode/claude-opus-4-6 | 0.67 | -0.119 | 89% | +4% | 14.88 | -0.30 | $21.8757 | +$21.4541 | 40m 04s |
| opencode/glm-5 | 0.623 | -0.166 | 78% | -7% | 11.57 | -3.60 | $6.4339 | +$6.0124 | 20m 10s |
| opencode/big-pickle | 0.615 | -0.174 | 67% | -19% | 15.39 | +0.21 | $0.0000 | -$0.4215 | 36m 28s |
| opencode/gpt-5.4 | 0.609 | -0.181 | 78% | -7% | 11.00 | -4.17 | $8.9827 | +$8.5611 | 32m 47s |
| opencode/claude-sonnet-4-6 | 0.593 | -0.197 | 78% | -7% | 16.43 | +1.25 | $11.8406 | +$11.4190 | 42m 31s |
| opencode/glm-5.1 | 0.547 | -0.242 | 67% | -19% | 12.06 | -3.12 | $1.8816 | +$1.4601 | 64m 39s |
| opencode/minimax-m2.5 | 0.481 | -0.308 | 56% | -30% | 18.87 | +3.69 | $0.6413 | +$0.2197 | 32m 15s |
| opencode/gpt-5.4-mini | 0.425 | -0.365 | 48% | -37% | 9.54 | -5.64 | $1.0606 | +$0.6391 | 21m 48s |
| opencode/minimax-m2.5-free | 0.415 | -0.375 | 59% | -26% | 16.19 | +1.01 | $0.0000 | -$0.4215 | 41m 34s |
| opencode/gemini-3-flash | 0.415 | -0.375 | 59% | -26% | 21.81 | +6.64 | $2.4307 | +$2.0091 | 62m 52s |
| opencode/gemini-3.1-pro | 0.291 | -0.498 | 37% | -48% | 12.70 | -2.47 | $5.8536 | +$5.4321 | 51m 25s |
| opencode/nemotron-3-super-free | 0.181 | -0.608 | 26% | -59% | 19.43 | +4.25 | $0.0000 | -$0.4215 | 109m 00s |
This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.
| Task | Field read | Baseline result | Winner | Gap to winner | Baseline cost | Baseline time |
|---|---|---|---|---|---|---|
| SELinux registry volume label repair | Clear separation | failed | opencode/kimi-k2.5 1.0 |
1.0 | $0.0125 | 59s |
| Kubernetes rollout repair | Clear separation | failed | opencode/gpt-5.4-mini 1.0 |
1.0 | $0.0199 | 1m 24s |
| Bootstrap phase validation repair | Competitive split | failed | opencode/kimi-k2.5 0.993 |
0.993 | $0.0231 | 1m 21s |
| ExternalDNS RFC2136 repair | Competitive split | failed | opencode/kimi-k2.5 0.982 |
0.982 | $0.0138 | 49s |
| Terraform static site repair | Competitive split | passed | opencode/kimi-k2.5 0.978 |
0.156 | $0.0175 | 1m 50s |
| K3s registry mirror trust repair | Competitive split | passed | opencode/big-pickle 1.0 |
0.114 | $0.0089 | 32s |
| Event status shell summary | Competitive split | passed | opencode/big-pickle 1.0 |
0.092 | $0.0089 | 34s |
| Ansible nginx role completion | Competitive split | passed | opencode/big-pickle 0.963 |
0.074 | $0.0131 | 48s |
| Workspace transplant bundle repair | Competitive split | passed | opencode/big-pickle 0.985 |
0.06 | $0.0127 | 44s |
| MCP OpenBao contract repair | Competitive split | passed | opencode/big-pickle 0.954 |
0.054 | $0.0156 | 58s |
| CNPG restore manifest repair | Competitive split | passed | opencode/big-pickle 0.964 |
0.049 | $0.0128 | 45s |
| GitOps workspace render validation | Competitive split | passed | opencode/big-pickle 0.941 |
0.046 | $0.0159 | 1m 01s |
| Log level rollup shell script | Competitive split | passed | opencode/big-pickle 0.965 |
0.044 | $0.0113 | 49s |
| Wildcard TLS route coverage | Competitive split | passed | opencode/kimi-k2.5 0.929 |
0.013 | $0.0159 | 53s |
| Traefik forwarded header trust repair | Competitive split | passed | opencode/kimi-k2.5 0.913 |
0.009 | $0.0176 | 1m 17s |
| RHEL k3s node preparation repair | Competitive split | passed | opencode/gpt-5.4-nano 1.0 |
0.0 | $0.0120 | 38s |
| Docker Compose observability fix | Competitive split | passed | opencode/gpt-5.4-nano 0.975 |
0.0 | $0.0286 | 2m 08s |
| nftables router ingress repair | Competitive split | passed | opencode/gpt-5.4-nano 0.98 |
0.0 | $0.0084 | 27s |
| RHEL NetworkManager bridge VLAN repair | Competitive split | passed | opencode/gpt-5.4-nano 0.951 |
0.0 | $0.0090 | 34s |
| Build workspace plane convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.942 |
0.0 | $0.0149 | 46s |
| Kubernetes OIDC RBAC repair | Competitive split | passed | opencode/gpt-5.4-nano 0.95 |
0.0 | $0.0172 | 1m 13s |
| RHEL edge firewalld router repair | Competitive split | passed | opencode/gpt-5.4-nano 0.953 |
0.0 | $0.0210 | 1m 10s |
| Workspace runtime access convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.932 |
0.0 | $0.0301 | 1m 47s |
| Pre-ArgoCD bootstrap sequencing | Competitive split | passed | opencode/gpt-5.4-nano 0.967 |
0.0 | $0.0202 | 1m 13s |
| AppArmor dnsmasq profile repair | Competitive split | passed | opencode/gpt-5.4-nano 0.918 |
0.0 | $0.0145 | 1m 03s |
| Log audit shell script | Competitive split | passed | opencode/gpt-5.4-nano 0.935 |
0.0 | $0.0107 | 48s |
| MetalLB ingress address pool repair | Competitive split | passed | opencode/gpt-5.4-nano 0.928 |
0.0 | $0.0152 | 1m 03s |
Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.
| Challenger | Task record | Composite edge | Success edge | Cost edge | Time edge | ORPT edge |
|---|---|---|---|---|---|---|
| opencode/nemotron-3-super-free | 23-0 4 ties |
+0.608 | +59% | +$0.4215 | -81m 28s | -4.25 |
| opencode/gemini-3.1-pro | 22-1 4 ties |
+0.498 | +48% | -$5.4321 | -23m 52s | +2.47 |
| opencode/minimax-m2.5-free | 23-2 2 ties |
+0.375 | +26% | +$0.4215 | -14m 01s | -1.01 |
| opencode/gemini-3-flash | 23-3 1 ties |
+0.375 | +26% | -$2.0091 | -35m 19s | -6.64 |
| opencode/gpt-5.4-mini | 19-5 3 ties |
+0.365 | +37% | -$0.6391 | +5m 44s | +5.64 |
| opencode/minimax-m2.5 | 18-6 3 ties |
+0.308 | +30% | -$0.2197 | -4m 42s | -3.69 |
| opencode/glm-5.1 | 23-4 0 ties |
+0.242 | +19% | -$1.4601 | -37m 07s | +3.12 |
| opencode/claude-sonnet-4-6 | 23-1 3 ties |
+0.197 | +7% | -$11.4190 | -14m 58s | -1.25 |
| opencode/gpt-5.4 | 23-3 1 ties |
+0.181 | +7% | -$8.5611 | -5m 15s | +4.17 |
| opencode/big-pickle | 14-10 3 ties |
+0.174 | +19% | +$0.4215 | -8m 56s | -0.21 |
| opencode/glm-5 | 23-2 2 ties |
+0.166 | +7% | -$6.0124 | +7m 22s | +3.60 |
| opencode/claude-opus-4-6 | 23-3 1 ties |
+0.119 | -4% | -$21.4541 | -12m 31s | +0.30 |
| opencode/kimi-k2.5 | 19-8 0 ties |
+0.005 | -4% | -$0.4907 | -13m 32s | +0.92 |
The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.
Primary blended price derived automatically from OpenRouter listing openai/gpt-5.4-nano using a 3:1 input:output blend.