This page uses opencode/gpt-5.4-mini as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.
These charts use opencode/gpt-5.4-mini as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.
Use this to decide whether another model beats opencode/gpt-5.4-mini enough to justify the change.
| Model | Composite | Delta vs baseline | Success | Success delta | ORPT | ORPT delta | Cost | Cost delta | Wall time |
|---|---|---|---|---|---|---|---|---|---|
| opencode/gpt-5.4-nano | 0.789 | +0.365 | 85% | +37% | 15.17 | +5.64 | $0.4215 | -$0.6391 | 27m 33s |
| opencode/kimi-k2.5 | 0.785 | +0.36 | 89% | +41% | 14.25 | +4.71 | $0.9122 | -$0.1484 | 41m 05s |
| opencode/claude-opus-4-6 | 0.67 | +0.245 | 89% | +41% | 14.88 | +5.34 | $21.8757 | +$20.8151 | 40m 04s |
| opencode/glm-5 | 0.623 | +0.198 | 78% | +30% | 11.57 | +2.03 | $6.4339 | +$5.3733 | 20m 10s |
| opencode/big-pickle | 0.615 | +0.19 | 67% | +19% | 15.39 | +5.85 | $0.0000 | -$1.0606 | 36m 28s |
| opencode/gpt-5.4 | 0.609 | +0.184 | 78% | +30% | 11.00 | +1.46 | $8.9827 | +$7.9221 | 32m 47s |
| opencode/claude-sonnet-4-6 | 0.593 | +0.168 | 78% | +30% | 16.43 | +6.89 | $11.8406 | +$10.7800 | 42m 31s |
| opencode/glm-5.1 | 0.547 | +0.122 | 67% | +19% | 12.06 | +2.52 | $1.8816 | +$0.8210 | 64m 39s |
| opencode/minimax-m2.5 | 0.481 | +0.056 | 56% | +7% | 18.87 | +9.33 | $0.6413 | -$0.4193 | 32m 15s |
| opencode/gpt-5.4-mini Baseline | 0.425 | +0.0 | 48% | +0% | 9.54 | +0.00 | $1.0606 | +$0.0000 | 21m 48s |
| opencode/minimax-m2.5-free | 0.415 | -0.01 | 59% | +11% | 16.19 | +6.65 | $0.0000 | -$1.0606 | 41m 34s |
| opencode/gemini-3-flash | 0.415 | -0.01 | 59% | +11% | 21.81 | +12.27 | $2.4307 | +$1.3701 | 62m 52s |
| opencode/gemini-3.1-pro | 0.291 | -0.134 | 37% | -11% | 12.70 | +3.16 | $5.8536 | +$4.7930 | 51m 25s |
| opencode/nemotron-3-super-free | 0.181 | -0.243 | 26% | -22% | 19.43 | +9.89 | $0.0000 | -$1.0606 | 109m 00s |
This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.
| Task | Field read | Baseline result | Winner | Gap to winner | Baseline cost | Baseline time |
|---|---|---|---|---|---|---|
| SELinux registry volume label repair | Clear separation | failed | opencode/kimi-k2.5 1.0 |
1.0 | $0.0326 | 1m 02s |
| RHEL k3s node preparation repair | Competitive split | failed | opencode/gpt-5.4-nano 1.0 |
1.0 | $0.0441 | 1m 03s |
| Bootstrap phase validation repair | Competitive split | failed | opencode/kimi-k2.5 0.993 |
0.993 | $0.0498 | 45s |
| ExternalDNS RFC2136 repair | Competitive split | failed | opencode/kimi-k2.5 0.982 |
0.982 | $0.0321 | 39s |
| nftables router ingress repair | Competitive split | failed | opencode/gpt-5.4-nano 0.98 |
0.98 | $0.0310 | 43s |
| Docker Compose observability fix | Competitive split | failed | opencode/gpt-5.4-nano 0.975 |
0.975 | $0.0288 | 33s |
| Pre-ArgoCD bootstrap sequencing | Competitive split | failed | opencode/gpt-5.4-nano 0.967 |
0.967 | $0.0416 | 47s |
| RHEL edge firewalld router repair | Competitive split | failed | opencode/gpt-5.4-nano 0.953 |
0.953 | $0.0277 | 33s |
| GitOps workspace render validation | Competitive split | failed | opencode/big-pickle 0.941 |
0.941 | $0.0597 | 1m 02s |
| Workspace runtime access convergence | Competitive split | failed | opencode/gpt-5.4-nano 0.932 |
0.932 | $0.0655 | 1m 15s |
| Wildcard TLS route coverage | Competitive split | failed | opencode/kimi-k2.5 0.929 |
0.929 | $0.0350 | 40s |
| MetalLB ingress address pool repair | Competitive split | failed | opencode/gpt-5.4-nano 0.928 |
0.928 | $0.0336 | 45s |
| AppArmor dnsmasq profile repair | Competitive split | failed | opencode/gpt-5.4-nano 0.918 |
0.918 | $0.0337 | 42s |
| Traefik forwarded header trust repair | Competitive split | failed | opencode/kimi-k2.5 0.913 |
0.913 | $0.0428 | 54s |
| Terraform static site repair | Competitive split | passed | opencode/kimi-k2.5 0.978 |
0.189 | $0.0530 | 2m 17s |
| Log level rollup shell script | Competitive split | passed | opencode/big-pickle 0.965 |
0.136 | $0.0486 | 56s |
| CNPG restore manifest repair | Competitive split | passed | opencode/big-pickle 0.964 |
0.131 | $0.0542 | 39s |
| Workspace transplant bundle repair | Competitive split | passed | opencode/big-pickle 0.985 |
0.108 | $0.0344 | 38s |
| K3s registry mirror trust repair | Competitive split | passed | opencode/big-pickle 1.0 |
0.104 | $0.0195 | 25s |
| RHEL NetworkManager bridge VLAN repair | Competitive split | passed | opencode/gpt-5.4-nano 0.951 |
0.1 | $0.0339 | 54s |
| Event status shell summary | Competitive split | passed | opencode/big-pickle 1.0 |
0.089 | $0.0196 | 25s |
| Build workspace plane convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.942 |
0.079 | $0.0574 | 57s |
| Kubernetes OIDC RBAC repair | Competitive split | passed | opencode/gpt-5.4-nano 0.95 |
0.078 | $0.0626 | 56s |
| Ansible nginx role completion | Competitive split | passed | opencode/big-pickle 0.963 |
0.045 | $0.0285 | 28s |
| MCP OpenBao contract repair | Competitive split | passed | opencode/big-pickle 0.954 |
0.038 | $0.0346 | 42s |
| Log audit shell script | Competitive split | passed | opencode/gpt-5.4-nano 0.935 |
0.02 | $0.0269 | 33s |
| Kubernetes rollout repair | Clear separation | passed | opencode/gpt-5.4-mini 1.0 |
0.0 | $0.0292 | 34s |
Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.
| Challenger | Task record | Composite edge | Success edge | Cost edge | Time edge | ORPT edge |
|---|---|---|---|---|---|---|
| opencode/gpt-5.4-nano | 5-19 3 ties |
-0.365 | -37% | +$0.6391 | -5m 44s | -5.64 |
| opencode/kimi-k2.5 | 7-19 1 ties |
-0.36 | -41% | +$0.1484 | -19m 16s | -4.71 |
| opencode/claude-opus-4-6 | 13-12 2 ties |
-0.245 | -41% | -$20.8151 | -18m 16s | -5.34 |
| opencode/nemotron-3-super-free | 13-1 13 ties |
+0.243 | +22% | +$1.0606 | -87m 12s | -9.89 |
| opencode/glm-5 | 13-9 5 ties |
-0.198 | -30% | -$5.3733 | +1m 38s | -2.03 |
| opencode/big-pickle | 4-17 6 ties |
-0.19 | -19% | +$1.0606 | -14m 40s | -5.85 |
| opencode/gpt-5.4 | 13-8 6 ties |
-0.184 | -30% | -$7.9221 | -10m 59s | -1.46 |
| opencode/claude-sonnet-4-6 | 13-10 4 ties |
-0.168 | -30% | -$10.7800 | -20m 42s | -6.89 |
| opencode/gemini-3.1-pro | 12-3 12 ties |
+0.134 | +11% | -$4.7930 | -29m 37s | -3.16 |
| opencode/glm-5.1 | 10-11 6 ties |
-0.122 | -19% | -$0.8210 | -42m 51s | -2.52 |
| opencode/minimax-m2.5 | 8-12 7 ties |
-0.056 | -7% | +$0.4193 | -10m 27s | -9.33 |
| opencode/minimax-m2.5-free | 13-6 8 ties |
+0.01 | -11% | +$1.0606 | -19m 45s | -6.65 |
| opencode/gemini-3-flash | 13-10 4 ties |
+0.01 | -11% | -$1.3701 | -41m 03s | -12.27 |
The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.
Primary blended price derived automatically from OpenRouter listing openai/gpt-5.4-mini using a 3:1 input:output blend.
Observed to complete ORPT-Bench scripting smoke runs cleanly and is the current preferred headless dev baseline.