This page uses opencode/glm-5.1 as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.
These charts use opencode/glm-5.1 as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.
Use this to decide whether another model beats opencode/glm-5.1 enough to justify the change.
| Model | Composite | Delta vs baseline | Success | Success delta | ORPT | ORPT delta | Cost | Cost delta | Wall time |
|---|---|---|---|---|---|---|---|---|---|
| opencode/gpt-5.4-nano | 0.789 | +0.242 | 85% | +19% | 15.17 | +3.12 | $0.4215 | -$1.4601 | 27m 33s |
| opencode/kimi-k2.5 | 0.785 | +0.238 | 89% | +22% | 14.25 | +2.19 | $0.9122 | -$0.9694 | 41m 05s |
| opencode/claude-opus-4-6 | 0.67 | +0.123 | 89% | +22% | 14.88 | +2.82 | $21.8757 | +$19.9941 | 40m 04s |
| opencode/glm-5 | 0.623 | +0.076 | 78% | +11% | 11.57 | -0.48 | $6.4339 | +$4.5523 | 20m 10s |
| opencode/big-pickle | 0.615 | +0.068 | 67% | +0% | 15.39 | +3.33 | $0.0000 | -$1.8816 | 36m 28s |
| opencode/gpt-5.4 | 0.609 | +0.062 | 78% | +11% | 11.00 | -1.06 | $8.9827 | +$7.1011 | 32m 47s |
| opencode/claude-sonnet-4-6 | 0.593 | +0.046 | 78% | +11% | 16.43 | +4.37 | $11.8406 | +$9.9589 | 42m 31s |
| opencode/glm-5.1 Baseline | 0.547 | +0.0 | 67% | +0% | 12.06 | +0.00 | $1.8816 | +$0.0000 | 64m 39s |
| opencode/minimax-m2.5 | 0.481 | -0.066 | 56% | -11% | 18.87 | +6.81 | $0.6413 | -$1.2404 | 32m 15s |
| opencode/gpt-5.4-mini | 0.425 | -0.122 | 48% | -19% | 9.54 | -2.52 | $1.0606 | -$0.8210 | 21m 48s |
| opencode/minimax-m2.5-free | 0.415 | -0.132 | 59% | -7% | 16.19 | +4.13 | $0.0000 | -$1.8816 | 41m 34s |
| opencode/gemini-3-flash | 0.415 | -0.132 | 59% | -7% | 21.81 | +9.76 | $2.4307 | +$0.5490 | 62m 52s |
| opencode/gemini-3.1-pro | 0.291 | -0.256 | 37% | -30% | 12.70 | +0.64 | $5.8536 | +$3.9720 | 51m 25s |
| opencode/nemotron-3-super-free | 0.181 | -0.365 | 26% | -41% | 19.43 | +7.37 | $0.0000 | -$1.8816 | 109m 00s |
This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.
| Task | Field read | Baseline result | Winner | Gap to winner | Baseline cost | Baseline time |
|---|---|---|---|---|---|---|
| RHEL k3s node preparation repair | Competitive split | failed | opencode/gpt-5.4-nano 1.0 |
1.0 | $0.0527 | 2m 28s |
| Event status shell summary | Competitive split | dnf | opencode/big-pickle 1.0 |
1.0 | n/a | 45s |
| nftables router ingress repair | Competitive split | failed | opencode/gpt-5.4-nano 0.98 |
0.98 | $0.0540 | 1m 48s |
| Docker Compose observability fix | Competitive split | dnf | opencode/gpt-5.4-nano 0.975 |
0.975 | n/a | 5m 00s |
| RHEL edge firewalld router repair | Competitive split | failed | opencode/gpt-5.4-nano 0.953 |
0.953 | $0.0359 | 54s |
| Kubernetes OIDC RBAC repair | Competitive split | failed | opencode/gpt-5.4-nano 0.95 |
0.95 | $0.0561 | 2m 01s |
| Log audit shell script | Competitive split | dnf | opencode/gpt-5.4-nano 0.935 |
0.935 | n/a | 1m 15s |
| MetalLB ingress address pool repair | Competitive split | failed | opencode/gpt-5.4-nano 0.928 |
0.928 | $0.0468 | 1m 27s |
| AppArmor dnsmasq profile repair | Competitive split | dnf | opencode/gpt-5.4-nano 0.918 |
0.918 | n/a | 5m 00s |
| Kubernetes rollout repair | Clear separation | passed | opencode/gpt-5.4-mini 1.0 |
0.202 | $0.1142 | 3m 58s |
| SELinux registry volume label repair | Clear separation | passed | opencode/kimi-k2.5 1.0 |
0.193 | $0.1312 | 3m 51s |
| GitOps workspace render validation | Competitive split | passed | opencode/big-pickle 0.941 |
0.166 | $0.1239 | 2m 56s |
| Terraform static site repair | Competitive split | passed | opencode/kimi-k2.5 0.978 |
0.162 | $0.0482 | 1m 55s |
| Bootstrap phase validation repair | Competitive split | passed | opencode/kimi-k2.5 0.993 |
0.162 | $0.1260 | 3m 36s |
| Workspace transplant bundle repair | Competitive split | passed | opencode/big-pickle 0.985 |
0.161 | $0.0594 | 2m 26s |
| RHEL NetworkManager bridge VLAN repair | Competitive split | passed | opencode/gpt-5.4-nano 0.951 |
0.158 | $0.0689 | 1m 47s |
| Build workspace plane convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.942 |
0.153 | $0.1291 | 3m 01s |
| K3s registry mirror trust repair | Competitive split | passed | opencode/big-pickle 1.0 |
0.148 | $0.0395 | 39s |
| ExternalDNS RFC2136 repair | Competitive split | passed | opencode/kimi-k2.5 0.982 |
0.145 | $0.1022 | 2m 42s |
| Pre-ArgoCD bootstrap sequencing | Competitive split | passed | opencode/gpt-5.4-nano 0.967 |
0.138 | $0.1335 | 2m 58s |
| MCP OpenBao contract repair | Competitive split | passed | opencode/big-pickle 0.954 |
0.136 | $0.1051 | 2m 02s |
| Log level rollup shell script | Competitive split | passed | opencode/big-pickle 0.965 |
0.129 | $0.0450 | 1m 01s |
| Ansible nginx role completion | Competitive split | passed | opencode/big-pickle 0.963 |
0.122 | $0.0511 | 51s |
| Wildcard TLS route coverage | Competitive split | passed | opencode/kimi-k2.5 0.929 |
0.119 | $0.0840 | 2m 13s |
| Workspace runtime access convergence | Competitive split | passed | opencode/gpt-5.4-nano 0.932 |
0.112 | $0.1503 | 3m 41s |
| CNPG restore manifest repair | Competitive split | passed | opencode/big-pickle 0.964 |
0.109 | $0.0472 | 1m 11s |
| Traefik forwarded header trust repair | Competitive split | passed | opencode/kimi-k2.5 0.913 |
0.077 | $0.0774 | 3m 13s |
Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.
| Challenger | Task record | Composite edge | Success edge | Cost edge | Time edge | ORPT edge |
|---|---|---|---|---|---|---|
| opencode/nemotron-3-super-free | 18-0 9 ties |
+0.365 | +41% | +$1.8816 | -44m 21s | -7.37 |
| opencode/gemini-3.1-pro | 16-4 7 ties |
+0.256 | +30% | -$3.9720 | +13m 14s | -0.64 |
| opencode/gpt-5.4-nano | 4-23 0 ties |
-0.242 | -19% | +$1.4601 | +37m 07s | -3.12 |
| opencode/kimi-k2.5 | 2-23 2 ties |
-0.238 | -22% | +$0.9694 | +23m 35s | -2.19 |
| opencode/minimax-m2.5-free | 18-3 6 ties |
+0.132 | +7% | +$1.8816 | +23m 06s | -4.13 |
| opencode/gemini-3-flash | 18-4 5 ties |
+0.132 | +7% | -$0.5490 | +1m 48s | -9.76 |
| opencode/claude-opus-4-6 | 18-7 2 ties |
-0.123 | -22% | -$19.9941 | +24m 35s | -2.82 |
| opencode/gpt-5.4-mini | 11-10 6 ties |
+0.122 | +19% | +$0.8210 | +42m 51s | +2.52 |
| opencode/glm-5 | 12-12 3 ties |
-0.076 | -11% | -$4.5523 | +44m 29s | +0.48 |
| opencode/big-pickle | 6-17 4 ties |
-0.068 | +0% | +$1.8816 | +28m 11s | -3.33 |
| opencode/minimax-m2.5 | 8-13 6 ties |
+0.066 | +11% | +$1.2404 | +32m 24s | -6.81 |
| opencode/gpt-5.4 | 17-6 4 ties |
-0.062 | -11% | -$7.1011 | +31m 52s | +1.06 |
| opencode/claude-sonnet-4-6 | 18-8 1 ties |
-0.046 | -11% | -$9.9589 | +22m 09s | -4.37 |
The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.
Primary blended price derived automatically from OpenRouter listing z-ai/glm-5.1 using a 3:1 input:output blend.