ORPT-Bench model detail
Model benchmark profile

opencode/gpt-5.4-nano

This page uses opencode/gpt-5.4-nano as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

openai low price tier dev-cheap standard
Composite
0.789
Correctness-weighted overall standing
Success
85%
Tasks completed successfully
ORPT
15.17
Requests per solved task
Total cost
$0.4215
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/gpt-5.4-nano

These charts use opencode/gpt-5.4-nano as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/gpt-5.4-nano enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano Baseline 0.789 +0.0 85% +0% 15.17 +0.00 $0.4215 +$0.0000 27m 33s
opencode/kimi-k2.5 0.785 -0.005 89% +4% 14.25 -0.92 $0.9122 +$0.4907 41m 05s
opencode/claude-opus-4-6 0.67 -0.119 89% +4% 14.88 -0.30 $21.8757 +$21.4541 40m 04s
opencode/glm-5 0.623 -0.166 78% -7% 11.57 -3.60 $6.4339 +$6.0124 20m 10s
opencode/big-pickle 0.615 -0.174 67% -19% 15.39 +0.21 $0.0000 -$0.4215 36m 28s
opencode/gpt-5.4 0.609 -0.181 78% -7% 11.00 -4.17 $8.9827 +$8.5611 32m 47s
opencode/claude-sonnet-4-6 0.593 -0.197 78% -7% 16.43 +1.25 $11.8406 +$11.4190 42m 31s
opencode/glm-5.1 0.547 -0.242 67% -19% 12.06 -3.12 $1.8816 +$1.4601 64m 39s
opencode/minimax-m2.5 0.481 -0.308 56% -30% 18.87 +3.69 $0.6413 +$0.2197 32m 15s
opencode/gpt-5.4-mini 0.425 -0.365 48% -37% 9.54 -5.64 $1.0606 +$0.6391 21m 48s
opencode/minimax-m2.5-free 0.415 -0.375 59% -26% 16.19 +1.01 $0.0000 -$0.4215 41m 34s
opencode/gemini-3-flash 0.415 -0.375 59% -26% 21.81 +6.64 $2.4307 +$2.0091 62m 52s
opencode/gemini-3.1-pro 0.291 -0.498 37% -48% 12.70 -2.47 $5.8536 +$5.4321 51m 25s
opencode/nemotron-3-super-free 0.181 -0.608 26% -59% 19.43 +4.25 $0.0000 -$0.4215 109m 00s
Task story

Where opencode/gpt-5.4-nano separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
SELinux registry volume label repair Clear separation failed opencode/kimi-k2.5
1.0
1.0 $0.0125 59s
Kubernetes rollout repair Clear separation failed opencode/gpt-5.4-mini
1.0
1.0 $0.0199 1m 24s
Bootstrap phase validation repair Competitive split failed opencode/kimi-k2.5
0.993
0.993 $0.0231 1m 21s
ExternalDNS RFC2136 repair Competitive split failed opencode/kimi-k2.5
0.982
0.982 $0.0138 49s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.156 $0.0175 1m 50s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.114 $0.0089 32s
Event status shell summary Competitive split passed opencode/big-pickle
1.0
0.092 $0.0089 34s
Ansible nginx role completion Competitive split passed opencode/big-pickle
0.963
0.074 $0.0131 48s
Workspace transplant bundle repair Competitive split passed opencode/big-pickle
0.985
0.06 $0.0127 44s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.054 $0.0156 58s
CNPG restore manifest repair Competitive split passed opencode/big-pickle
0.964
0.049 $0.0128 45s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.046 $0.0159 1m 01s
Log level rollup shell script Competitive split passed opencode/big-pickle
0.965
0.044 $0.0113 49s
Wildcard TLS route coverage Competitive split passed opencode/kimi-k2.5
0.929
0.013 $0.0159 53s
Traefik forwarded header trust repair Competitive split passed opencode/kimi-k2.5
0.913
0.009 $0.0176 1m 17s
RHEL k3s node preparation repair Competitive split passed opencode/gpt-5.4-nano
1.0
0.0 $0.0120 38s
Docker Compose observability fix Competitive split passed opencode/gpt-5.4-nano
0.975
0.0 $0.0286 2m 08s
nftables router ingress repair Competitive split passed opencode/gpt-5.4-nano
0.98
0.0 $0.0084 27s
RHEL NetworkManager bridge VLAN repair Competitive split passed opencode/gpt-5.4-nano
0.951
0.0 $0.0090 34s
Build workspace plane convergence Competitive split passed opencode/gpt-5.4-nano
0.942
0.0 $0.0149 46s
Kubernetes OIDC RBAC repair Competitive split passed opencode/gpt-5.4-nano
0.95
0.0 $0.0172 1m 13s
RHEL edge firewalld router repair Competitive split passed opencode/gpt-5.4-nano
0.953
0.0 $0.0210 1m 10s
Workspace runtime access convergence Competitive split passed opencode/gpt-5.4-nano
0.932
0.0 $0.0301 1m 47s
Pre-ArgoCD bootstrap sequencing Competitive split passed opencode/gpt-5.4-nano
0.967
0.0 $0.0202 1m 13s
AppArmor dnsmasq profile repair Competitive split passed opencode/gpt-5.4-nano
0.918
0.0 $0.0145 1m 03s
Log audit shell script Competitive split passed opencode/gpt-5.4-nano
0.935
0.0 $0.0107 48s
MetalLB ingress address pool repair Competitive split passed opencode/gpt-5.4-nano
0.928
0.0 $0.0152 1m 03s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/nemotron-3-super-free 23-0
4 ties
+0.608 +59% +$0.4215 -81m 28s -4.25
opencode/gemini-3.1-pro 22-1
4 ties
+0.498 +48% -$5.4321 -23m 52s +2.47
opencode/minimax-m2.5-free 23-2
2 ties
+0.375 +26% +$0.4215 -14m 01s -1.01
opencode/gemini-3-flash 23-3
1 ties
+0.375 +26% -$2.0091 -35m 19s -6.64
opencode/gpt-5.4-mini 19-5
3 ties
+0.365 +37% -$0.6391 +5m 44s +5.64
opencode/minimax-m2.5 18-6
3 ties
+0.308 +30% -$0.2197 -4m 42s -3.69
opencode/glm-5.1 23-4
0 ties
+0.242 +19% -$1.4601 -37m 07s +3.12
opencode/claude-sonnet-4-6 23-1
3 ties
+0.197 +7% -$11.4190 -14m 58s -1.25
opencode/gpt-5.4 23-3
1 ties
+0.181 +7% -$8.5611 -5m 15s +4.17
opencode/big-pickle 14-10
3 ties
+0.174 +19% +$0.4215 -8m 56s -0.21
opencode/glm-5 23-2
2 ties
+0.166 +7% -$6.0124 +7m 22s +3.60
opencode/claude-opus-4-6 23-3
1 ties
+0.119 -4% -$21.4541 -12m 31s +0.30
opencode/kimi-k2.5 19-8
0 ties
+0.005 -4% -$0.4907 -13m 32s +0.92
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests413
Wall time27m 33s
Average task cost$0.0153
Benchmark supportunknown
Catalog blended price$0.4625 / 1M tok
Catalog speedn/a
Intelligencen/a
Agenticn/a

Primary blended price derived automatically from OpenRouter listing openai/gpt-5.4-nano using a 3:1 input:output blend.