ORPT-Bench model detail
Model benchmark profile

opencode/gpt-5.4

This page uses opencode/gpt-5.4 as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

openai medium price tier standard release-frontier
Composite
0.609
Correctness-weighted overall standing
Success
78%
Tasks completed successfully
ORPT
11.00
Requests per solved task
Total cost
$8.9827
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/gpt-5.4

These charts use opencode/gpt-5.4 as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/gpt-5.4 enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano 0.789 +0.181 85% +7% 15.17 +4.17 $0.4215 -$8.5611 27m 33s
opencode/kimi-k2.5 0.785 +0.176 89% +11% 14.25 +3.25 $0.9122 -$8.0705 41m 05s
opencode/claude-opus-4-6 0.67 +0.062 89% +11% 14.88 +3.88 $21.8757 +$12.8930 40m 04s
opencode/glm-5 0.623 +0.014 78% +0% 11.57 +0.57 $6.4339 -$2.5488 20m 10s
opencode/big-pickle 0.615 +0.006 67% -11% 15.39 +4.39 $0.0000 -$8.9827 36m 28s
opencode/gpt-5.4 Baseline 0.609 +0.0 78% +0% 11.00 +0.00 $8.9827 +$0.0000 32m 47s
opencode/claude-sonnet-4-6 0.593 -0.016 78% +0% 16.43 +5.43 $11.8406 +$2.8579 42m 31s
opencode/glm-5.1 0.547 -0.062 67% -11% 12.06 +1.06 $1.8816 -$7.1011 64m 39s
opencode/minimax-m2.5 0.481 -0.128 56% -22% 18.87 +7.87 $0.6413 -$8.3414 32m 15s
opencode/gpt-5.4-mini 0.425 -0.184 48% -30% 9.54 -1.46 $1.0606 -$7.9221 21m 48s
opencode/minimax-m2.5-free 0.415 -0.194 59% -19% 16.19 +5.19 $0.0000 -$8.9827 41m 34s
opencode/gemini-3-flash 0.415 -0.194 59% -19% 21.81 +10.81 $2.4307 -$6.5520 62m 52s
opencode/gemini-3.1-pro 0.291 -0.318 37% -41% 12.70 +1.70 $5.8536 -$3.1291 51m 25s
opencode/nemotron-3-super-free 0.181 -0.427 26% -52% 19.43 +8.43 $0.0000 -$8.9827 109m 00s
Task story

Where opencode/gpt-5.4 separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
RHEL k3s node preparation repair Competitive split failed opencode/gpt-5.4-nano
1.0
1.0 $0.3012 51s
Bootstrap phase validation repair Competitive split failed opencode/kimi-k2.5
0.993
0.993 $0.4329 1m 27s
nftables router ingress repair Competitive split failed opencode/gpt-5.4-nano
0.98
0.98 $0.3503 1m 13s
Docker Compose observability fix Competitive split failed opencode/gpt-5.4-nano
0.975
0.975 $0.2825 46s
Pre-ArgoCD bootstrap sequencing Competitive split failed opencode/gpt-5.4-nano
0.967
0.967 $0.4099 1m 21s
RHEL edge firewalld router repair Competitive split failed opencode/gpt-5.4-nano
0.953
0.953 $0.2606 41s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.242 $0.2694 1m 16s
Workspace transplant bundle repair Competitive split passed opencode/big-pickle
0.985
0.237 $0.4056 2m 07s
Event status shell summary Competitive split passed opencode/big-pickle
1.0
0.217 $0.2351 33s
Kubernetes rollout repair Clear separation passed opencode/gpt-5.4-mini
1.0
0.209 $0.3143 59s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.203 $0.2631 43s
RHEL NetworkManager bridge VLAN repair Competitive split passed opencode/gpt-5.4-nano
0.951
0.2 $0.3071 1m 02s
SELinux registry volume label repair Clear separation passed opencode/kimi-k2.5
1.0
0.198 $0.3297 1m 09s
CNPG restore manifest repair Competitive split passed opencode/big-pickle
0.964
0.191 $0.3379 1m 12s
Log level rollup shell script Competitive split passed opencode/big-pickle
0.965
0.173 $0.2457 48s
Build workspace plane convergence Competitive split passed opencode/gpt-5.4-nano
0.942
0.172 $0.4417 1m 14s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.172 $0.3493 1m 11s
Ansible nginx role completion Competitive split passed opencode/big-pickle
0.963
0.171 $0.2740 41s
Log audit shell script Competitive split passed opencode/gpt-5.4-nano
0.935
0.167 $0.2761 1m 06s
ExternalDNS RFC2136 repair Competitive split passed opencode/kimi-k2.5
0.982
0.167 $0.3297 1m 02s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.161 $0.3886 1m 20s
Workspace runtime access convergence Competitive split passed opencode/gpt-5.4-nano
0.932
0.158 $0.5689 3m 35s
Kubernetes OIDC RBAC repair Competitive split passed opencode/gpt-5.4-nano
0.95
0.146 $0.3245 1m 05s
Wildcard TLS route coverage Competitive split passed opencode/kimi-k2.5
0.929
0.133 $0.3123 47s
Traefik forwarded header trust repair Competitive split passed opencode/kimi-k2.5
0.913
0.13 $0.3067 2m 18s
AppArmor dnsmasq profile repair Competitive split passed opencode/gpt-5.4-nano
0.918
0.126 $0.3399 1m 29s
MetalLB ingress address pool repair Competitive split passed opencode/gpt-5.4-nano
0.928
0.125 $0.3259 53s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/nemotron-3-super-free 21-0
6 ties
+0.427 +52% +$8.9827 -76m 13s -8.43
opencode/gemini-3.1-pro 17-5
5 ties
+0.318 +41% +$3.1291 -18m 38s -1.70
opencode/minimax-m2.5-free 21-1
5 ties
+0.194 +19% +$8.9827 -8m 46s -5.19
opencode/gemini-3-flash 21-2
4 ties
+0.194 +19% +$6.5520 -30m 04s -10.81
opencode/gpt-5.4-mini 8-13
6 ties
+0.184 +30% +$7.9221 +10m 59s +1.46
opencode/gpt-5.4-nano 3-23
1 ties
-0.181 -7% +$8.5611 +5m 15s -4.17
opencode/kimi-k2.5 2-24
1 ties
-0.176 -11% +$8.0705 -8m 17s -3.25
opencode/minimax-m2.5 11-13
3 ties
+0.128 +22% +$8.3414 +32s -7.87
opencode/glm-5.1 6-17
4 ties
+0.062 +11% +$7.1011 -31m 52s -1.06
opencode/claude-opus-4-6 21-5
1 ties
-0.062 -11% -$12.8930 -7m 17s -3.88
opencode/claude-sonnet-4-6 19-7
1 ties
+0.016 +0% -$2.8579 -9m 43s -5.43
opencode/glm-5 7-16
4 ties
-0.014 +0% +$2.5488 +12m 37s -0.57
opencode/big-pickle 6-18
3 ties
-0.006 +11% +$8.9827 -3m 41s -4.39
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests280
Wall time32m 47s
Average task cost$0.3307
Benchmark supportunknown
Catalog blended price$5.6000 / 1M tok
Catalog speed74 tok/s
Intelligence57
Agenticn/a

OpenRouter reference blend for openai/gpt-5.4 is 5.625 USD per 1M tokens using a 3:1 input:output mix.