ORPT-Bench model detail
Model benchmark profile

opencode/glm-5.1

This page uses opencode/glm-5.1 as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

z-ai medium price tier standard standard
Composite
0.547
Correctness-weighted overall standing
Success
67%
Tasks completed successfully
ORPT
12.06
Requests per solved task
Total cost
$1.8816
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/glm-5.1

These charts use opencode/glm-5.1 as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/glm-5.1 enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano 0.789 +0.242 85% +19% 15.17 +3.12 $0.4215 -$1.4601 27m 33s
opencode/kimi-k2.5 0.785 +0.238 89% +22% 14.25 +2.19 $0.9122 -$0.9694 41m 05s
opencode/claude-opus-4-6 0.67 +0.123 89% +22% 14.88 +2.82 $21.8757 +$19.9941 40m 04s
opencode/glm-5 0.623 +0.076 78% +11% 11.57 -0.48 $6.4339 +$4.5523 20m 10s
opencode/big-pickle 0.615 +0.068 67% +0% 15.39 +3.33 $0.0000 -$1.8816 36m 28s
opencode/gpt-5.4 0.609 +0.062 78% +11% 11.00 -1.06 $8.9827 +$7.1011 32m 47s
opencode/claude-sonnet-4-6 0.593 +0.046 78% +11% 16.43 +4.37 $11.8406 +$9.9589 42m 31s
opencode/glm-5.1 Baseline 0.547 +0.0 67% +0% 12.06 +0.00 $1.8816 +$0.0000 64m 39s
opencode/minimax-m2.5 0.481 -0.066 56% -11% 18.87 +6.81 $0.6413 -$1.2404 32m 15s
opencode/gpt-5.4-mini 0.425 -0.122 48% -19% 9.54 -2.52 $1.0606 -$0.8210 21m 48s
opencode/minimax-m2.5-free 0.415 -0.132 59% -7% 16.19 +4.13 $0.0000 -$1.8816 41m 34s
opencode/gemini-3-flash 0.415 -0.132 59% -7% 21.81 +9.76 $2.4307 +$0.5490 62m 52s
opencode/gemini-3.1-pro 0.291 -0.256 37% -30% 12.70 +0.64 $5.8536 +$3.9720 51m 25s
opencode/nemotron-3-super-free 0.181 -0.365 26% -41% 19.43 +7.37 $0.0000 -$1.8816 109m 00s
Task story

Where opencode/glm-5.1 separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
RHEL k3s node preparation repair Competitive split failed opencode/gpt-5.4-nano
1.0
1.0 $0.0527 2m 28s
Event status shell summary Competitive split dnf opencode/big-pickle
1.0
1.0 n/a 45s
nftables router ingress repair Competitive split failed opencode/gpt-5.4-nano
0.98
0.98 $0.0540 1m 48s
Docker Compose observability fix Competitive split dnf opencode/gpt-5.4-nano
0.975
0.975 n/a 5m 00s
RHEL edge firewalld router repair Competitive split failed opencode/gpt-5.4-nano
0.953
0.953 $0.0359 54s
Kubernetes OIDC RBAC repair Competitive split failed opencode/gpt-5.4-nano
0.95
0.95 $0.0561 2m 01s
Log audit shell script Competitive split dnf opencode/gpt-5.4-nano
0.935
0.935 n/a 1m 15s
MetalLB ingress address pool repair Competitive split failed opencode/gpt-5.4-nano
0.928
0.928 $0.0468 1m 27s
AppArmor dnsmasq profile repair Competitive split dnf opencode/gpt-5.4-nano
0.918
0.918 n/a 5m 00s
Kubernetes rollout repair Clear separation passed opencode/gpt-5.4-mini
1.0
0.202 $0.1142 3m 58s
SELinux registry volume label repair Clear separation passed opencode/kimi-k2.5
1.0
0.193 $0.1312 3m 51s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.166 $0.1239 2m 56s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.162 $0.0482 1m 55s
Bootstrap phase validation repair Competitive split passed opencode/kimi-k2.5
0.993
0.162 $0.1260 3m 36s
Workspace transplant bundle repair Competitive split passed opencode/big-pickle
0.985
0.161 $0.0594 2m 26s
RHEL NetworkManager bridge VLAN repair Competitive split passed opencode/gpt-5.4-nano
0.951
0.158 $0.0689 1m 47s
Build workspace plane convergence Competitive split passed opencode/gpt-5.4-nano
0.942
0.153 $0.1291 3m 01s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.148 $0.0395 39s
ExternalDNS RFC2136 repair Competitive split passed opencode/kimi-k2.5
0.982
0.145 $0.1022 2m 42s
Pre-ArgoCD bootstrap sequencing Competitive split passed opencode/gpt-5.4-nano
0.967
0.138 $0.1335 2m 58s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.136 $0.1051 2m 02s
Log level rollup shell script Competitive split passed opencode/big-pickle
0.965
0.129 $0.0450 1m 01s
Ansible nginx role completion Competitive split passed opencode/big-pickle
0.963
0.122 $0.0511 51s
Wildcard TLS route coverage Competitive split passed opencode/kimi-k2.5
0.929
0.119 $0.0840 2m 13s
Workspace runtime access convergence Competitive split passed opencode/gpt-5.4-nano
0.932
0.112 $0.1503 3m 41s
CNPG restore manifest repair Competitive split passed opencode/big-pickle
0.964
0.109 $0.0472 1m 11s
Traefik forwarded header trust repair Competitive split passed opencode/kimi-k2.5
0.913
0.077 $0.0774 3m 13s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/nemotron-3-super-free 18-0
9 ties
+0.365 +41% +$1.8816 -44m 21s -7.37
opencode/gemini-3.1-pro 16-4
7 ties
+0.256 +30% -$3.9720 +13m 14s -0.64
opencode/gpt-5.4-nano 4-23
0 ties
-0.242 -19% +$1.4601 +37m 07s -3.12
opencode/kimi-k2.5 2-23
2 ties
-0.238 -22% +$0.9694 +23m 35s -2.19
opencode/minimax-m2.5-free 18-3
6 ties
+0.132 +7% +$1.8816 +23m 06s -4.13
opencode/gemini-3-flash 18-4
5 ties
+0.132 +7% -$0.5490 +1m 48s -9.76
opencode/claude-opus-4-6 18-7
2 ties
-0.123 -22% -$19.9941 +24m 35s -2.82
opencode/gpt-5.4-mini 11-10
6 ties
+0.122 +19% +$0.8210 +42m 51s +2.52
opencode/glm-5 12-12
3 ties
-0.076 -11% -$4.5523 +44m 29s +0.48
opencode/big-pickle 6-17
4 ties
-0.068 +0% +$1.8816 +28m 11s -3.33
opencode/minimax-m2.5 8-13
6 ties
+0.066 +11% +$1.2404 +32m 24s -6.81
opencode/gpt-5.4 17-6
4 ties
-0.062 -11% -$7.1011 +31m 52s +1.06
opencode/claude-sonnet-4-6 18-8
1 ties
-0.046 -11% -$9.9589 +22m 09s -4.37
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests286
Wall time64m 39s
Average task cost$0.0909
Benchmark supportunknown
Catalog blended price$2.1463 / 1M tok
Catalog speedn/a
Intelligencen/a
Agenticn/a

Primary blended price derived automatically from OpenRouter listing z-ai/glm-5.1 using a 3:1 input:output blend.