This page uses opencode/claude-opus-4-6 as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.
These charts use opencode/claude-opus-4-6 as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.
Use this to decide whether another model beats opencode/claude-opus-4-6 enough to justify the change.
| Model | Composite | Delta vs baseline | Success | Success delta | ORPT | ORPT delta | Cost | Cost delta | Wall time |
|---|---|---|---|---|---|---|---|---|---|
| opencode/gpt-5.4-nano | 0.789 | +0.119 | 85% | -4% | 15.17 | +0.30 | $0.4215 | -$21.4541 | 27m 33s |
| opencode/kimi-k2.5 | 0.785 | +0.114 | 89% | +0% | 14.25 | -0.63 | $0.9122 | -$20.9634 | 41m 05s |
| opencode/claude-opus-4-6 Baseline | 0.67 | +0.0 | 89% | +0% | 14.88 | +0.00 | $21.8757 | +$0.0000 | 40m 04s |
| opencode/glm-5 | 0.623 | -0.047 | 78% | -11% | 11.57 | -3.30 | $6.4339 | -$15.4417 | 20m 10s |
| opencode/big-pickle | 0.615 | -0.055 | 67% | -22% | 15.39 | +0.51 | $0.0000 | -$21.8757 | 36m 28s |
| opencode/gpt-5.4 | 0.609 | -0.062 | 78% | -11% | 11.00 | -3.88 | $8.9827 | -$12.8930 | 32m 47s |
| opencode/claude-sonnet-4-6 | 0.593 | -0.077 | 78% | -11% | 16.43 | +1.55 | $11.8406 | -$10.0351 | 42m 31s |
| opencode/glm-5.1 | 0.547 | -0.123 | 67% | -22% | 12.06 | -2.82 | $1.8816 | -$19.9941 | 64m 39s |
| opencode/minimax-m2.5 | 0.481 | -0.189 | 56% | -33% | 18.87 | +3.99 | $0.6413 | -$21.2344 | 32m 15s |
| opencode/gpt-5.4-mini | 0.425 | -0.245 | 48% | -41% | 9.54 | -5.34 | $1.0606 | -$20.8151 | 21m 48s |
| opencode/minimax-m2.5-free | 0.415 | -0.255 | 59% | -30% | 16.19 | +1.31 | $0.0000 | -$21.8757 | 41m 34s |
| opencode/gemini-3-flash | 0.415 | -0.255 | 59% | -30% | 21.81 | +6.94 | $2.4307 | -$19.4450 | 62m 52s |
| opencode/gemini-3.1-pro | 0.291 | -0.379 | 37% | -52% | 12.70 | -2.18 | $5.8536 | -$16.0221 | 51m 25s |
| opencode/nemotron-3-super-free | 0.181 | -0.489 | 26% | -63% | 19.43 | +4.55 | $0.0000 | -$21.8757 | 109m 00s |
This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.
| Task | Field read | Baseline result | Winner | Gap to winner | Baseline cost | Baseline time |
|---|---|---|---|---|---|---|
| Kubernetes rollout repair | Clear separation | failed | opencode/gpt-5.4-mini 1.0 |
1.0 | $0.7268 | 56s |
| Docker Compose observability fix | Competitive split | failed | opencode/gpt-5.4-nano 0.975 |
0.975 | $1.2297 | 4m 16s |
| MetalLB ingress address pool repair | Competitive split | failed | opencode/gpt-5.4-nano 0.928 |
0.928 | $0.7403 | 1m 04s |
| K3s registry mirror trust repair | Competitive split | passed | opencode/big-pickle 1.0 |
0.263 | $0.6582 | 59s |
| RHEL k3s node preparation repair | Competitive split | passed | opencode/gpt-5.4-nano 1.0 |
0.255 | $0.9392 | 2m 03s |
| Event status shell summary | Competitive split | passed | opencode/big-pickle 1.0 |
0.251 | $0.5796 | 38s |
| nftables router ingress repair | Competitive split | passed | opencode/gpt-5.4-nano 0.98 |
0.245 | $0.7916 | 1m 07s |
| Workspace transplant bundle repair | Competitive split | passed | opencode/big-pickle 0.985 |
0.241 | $0.7076 | 1m 24s |
| SELinux registry volume label repair | Clear separation | passed | opencode/kimi-k2.5 1.0 |
0.241 | $0.7973 | 1m 57s |
| CNPG restore manifest repair | Competitive split | passed | opencode/big-pickle 0.964 |
0.228 | $0.9118 | 1m 29s |
| Terraform static site repair | Competitive split | passed | opencode/kimi-k2.5 0.978 |
0.223 | $0.6775 | 46s |
| Ansible nginx role completion | Competitive split | passed | opencode/big-pickle 0.963 |
0.216 | $0.7302 | 1m 02s |
| ExternalDNS RFC2136 repair | Competitive split | passed | opencode/kimi-k2.5 0.982 |
0.215 | $0.7899 | 1m 44s |
| RHEL NetworkManager bridge VLAN repair | Competitive split | passed | opencode/gpt-5.4-nano 0.951 |
0.21 | $0.7664 | 1m 01s |
| MCP OpenBao contract repair | Competitive split | passed | opencode/big-pickle 0.954 |
0.207 | $0.7456 | 1m 22s |
| Build workspace plane convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.942 |
0.206 | $1.1061 | 2m 42s |
| Pre-ArgoCD bootstrap sequencing | Competitive split | passed | opencode/gpt-5.4-nano 0.967 |
0.205 | $0.8700 | 2m 28s |
| Log level rollup shell script | Competitive split | passed | opencode/big-pickle 0.965 |
0.205 | $0.5953 | 50s |
| Bootstrap phase validation repair | Competitive split | passed | opencode/kimi-k2.5 0.993 |
0.2 | $0.8939 | 1m 20s |
| GitOps workspace render validation | Competitive split | passed | opencode/big-pickle 0.941 |
0.194 | $0.9304 | 1m 17s |
| Kubernetes OIDC RBAC repair | Competitive split | passed | opencode/gpt-5.4-nano 0.95 |
0.185 | $0.8019 | 1m 13s |
| Workspace runtime access convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.932 |
0.178 | $1.1706 | 3m 01s |
| RHEL edge firewalld router repair | Competitive split | passed | opencode/gpt-5.4-nano 0.953 |
0.177 | $0.7708 | 1m 17s |
| Log audit shell script | Competitive split | passed | opencode/gpt-5.4-nano 0.935 |
0.172 | $0.6065 | 39s |
| Wildcard TLS route coverage | Competitive split | passed | opencode/kimi-k2.5 0.929 |
0.168 | $0.7877 | 1m 00s |
| AppArmor dnsmasq profile repair | Competitive split | passed | opencode/gpt-5.4-nano 0.918 |
0.167 | $0.8037 | 1m 21s |
| Traefik forwarded header trust repair | Competitive split | passed | opencode/kimi-k2.5 0.913 |
0.147 | $0.7472 | 1m 08s |
Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.
| Challenger | Task record | Composite edge | Success edge | Cost edge | Time edge | ORPT edge |
|---|---|---|---|---|---|---|
| opencode/nemotron-3-super-free | 24-0 3 ties |
+0.489 | +63% | +$21.8757 | -68m 56s | -4.55 |
| opencode/gemini-3.1-pro | 17-8 2 ties |
+0.379 | +52% | +$16.0221 | -11m 21s | +2.18 |
| opencode/minimax-m2.5-free | 24-2 1 ties |
+0.255 | +30% | +$21.8757 | -1m 29s | -1.31 |
| opencode/gemini-3-flash | 24-1 2 ties |
+0.255 | +30% | +$19.4450 | -22m 47s | -6.94 |
| opencode/gpt-5.4-mini | 12-13 2 ties |
+0.245 | +41% | +$20.8151 | +18m 16s | +5.34 |
| opencode/minimax-m2.5 | 10-14 3 ties |
+0.189 | +33% | +$21.2344 | +7m 49s | -3.99 |
| opencode/glm-5.1 | 7-18 2 ties |
+0.123 | +22% | +$19.9941 | -24m 35s | +2.82 |
| opencode/gpt-5.4-nano | 3-23 1 ties |
-0.119 | +4% | +$21.4541 | +12m 31s | -0.30 |
| opencode/kimi-k2.5 | 3-24 0 ties |
-0.114 | +0% | +$20.9634 | -1m 00s | +0.63 |
| opencode/claude-sonnet-4-6 | 7-18 2 ties |
+0.077 | +11% | +$10.0351 | -2m 26s | -1.55 |
| opencode/gpt-5.4 | 5-21 1 ties |
+0.062 | +11% | +$12.8930 | +7m 17s | +3.88 |
| opencode/big-pickle | 7-18 2 ties |
+0.055 | +22% | +$21.8757 | +3m 36s | -0.51 |
| opencode/glm-5 | 6-20 1 ties |
+0.047 | +11% | +$15.4417 | +19m 54s | +3.30 |
The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.
OpenRouter reference blend for anthropic/claude-opus-4.6-fast is 60 USD per 1M tokens using a 3:1 input:output mix.