ORPT-Bench model detail
Model benchmark profile

opencode/claude-opus-4-6

This page uses opencode/claude-opus-4-6 as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

anthropic high price tier expensive balanced-general
Composite
0.67
Correctness-weighted overall standing
Success
89%
Tasks completed successfully
ORPT
14.88
Requests per solved task
Total cost
$21.8757
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/claude-opus-4-6

These charts use opencode/claude-opus-4-6 as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/claude-opus-4-6 enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano 0.789 +0.119 85% -4% 15.17 +0.30 $0.4215 -$21.4541 27m 33s
opencode/kimi-k2.5 0.785 +0.114 89% +0% 14.25 -0.63 $0.9122 -$20.9634 41m 05s
opencode/claude-opus-4-6 Baseline 0.67 +0.0 89% +0% 14.88 +0.00 $21.8757 +$0.0000 40m 04s
opencode/glm-5 0.623 -0.047 78% -11% 11.57 -3.30 $6.4339 -$15.4417 20m 10s
opencode/big-pickle 0.615 -0.055 67% -22% 15.39 +0.51 $0.0000 -$21.8757 36m 28s
opencode/gpt-5.4 0.609 -0.062 78% -11% 11.00 -3.88 $8.9827 -$12.8930 32m 47s
opencode/claude-sonnet-4-6 0.593 -0.077 78% -11% 16.43 +1.55 $11.8406 -$10.0351 42m 31s
opencode/glm-5.1 0.547 -0.123 67% -22% 12.06 -2.82 $1.8816 -$19.9941 64m 39s
opencode/minimax-m2.5 0.481 -0.189 56% -33% 18.87 +3.99 $0.6413 -$21.2344 32m 15s
opencode/gpt-5.4-mini 0.425 -0.245 48% -41% 9.54 -5.34 $1.0606 -$20.8151 21m 48s
opencode/minimax-m2.5-free 0.415 -0.255 59% -30% 16.19 +1.31 $0.0000 -$21.8757 41m 34s
opencode/gemini-3-flash 0.415 -0.255 59% -30% 21.81 +6.94 $2.4307 -$19.4450 62m 52s
opencode/gemini-3.1-pro 0.291 -0.379 37% -52% 12.70 -2.18 $5.8536 -$16.0221 51m 25s
opencode/nemotron-3-super-free 0.181 -0.489 26% -63% 19.43 +4.55 $0.0000 -$21.8757 109m 00s
Task story

Where opencode/claude-opus-4-6 separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
Kubernetes rollout repair Clear separation failed opencode/gpt-5.4-mini
1.0
1.0 $0.7268 56s
Docker Compose observability fix Competitive split failed opencode/gpt-5.4-nano
0.975
0.975 $1.2297 4m 16s
MetalLB ingress address pool repair Competitive split failed opencode/gpt-5.4-nano
0.928
0.928 $0.7403 1m 04s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.263 $0.6582 59s
RHEL k3s node preparation repair Competitive split passed opencode/gpt-5.4-nano
1.0
0.255 $0.9392 2m 03s
Event status shell summary Competitive split passed opencode/big-pickle
1.0
0.251 $0.5796 38s
nftables router ingress repair Competitive split passed opencode/gpt-5.4-nano
0.98
0.245 $0.7916 1m 07s
Workspace transplant bundle repair Competitive split passed opencode/big-pickle
0.985
0.241 $0.7076 1m 24s
SELinux registry volume label repair Clear separation passed opencode/kimi-k2.5
1.0
0.241 $0.7973 1m 57s
CNPG restore manifest repair Competitive split passed opencode/big-pickle
0.964
0.228 $0.9118 1m 29s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.223 $0.6775 46s
Ansible nginx role completion Competitive split passed opencode/big-pickle
0.963
0.216 $0.7302 1m 02s
ExternalDNS RFC2136 repair Competitive split passed opencode/kimi-k2.5
0.982
0.215 $0.7899 1m 44s
RHEL NetworkManager bridge VLAN repair Competitive split passed opencode/gpt-5.4-nano
0.951
0.21 $0.7664 1m 01s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.207 $0.7456 1m 22s
Build workspace plane convergence Competitive split passed opencode/gpt-5.4-nano
0.942
0.206 $1.1061 2m 42s
Pre-ArgoCD bootstrap sequencing Competitive split passed opencode/gpt-5.4-nano
0.967
0.205 $0.8700 2m 28s
Log level rollup shell script Competitive split passed opencode/big-pickle
0.965
0.205 $0.5953 50s
Bootstrap phase validation repair Competitive split passed opencode/kimi-k2.5
0.993
0.2 $0.8939 1m 20s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.194 $0.9304 1m 17s
Kubernetes OIDC RBAC repair Competitive split passed opencode/gpt-5.4-nano
0.95
0.185 $0.8019 1m 13s
Workspace runtime access convergence Competitive split passed opencode/gpt-5.4-nano
0.932
0.178 $1.1706 3m 01s
RHEL edge firewalld router repair Competitive split passed opencode/gpt-5.4-nano
0.953
0.177 $0.7708 1m 17s
Log audit shell script Competitive split passed opencode/gpt-5.4-nano
0.935
0.172 $0.6065 39s
Wildcard TLS route coverage Competitive split passed opencode/kimi-k2.5
0.929
0.168 $0.7877 1m 00s
AppArmor dnsmasq profile repair Competitive split passed opencode/gpt-5.4-nano
0.918
0.167 $0.8037 1m 21s
Traefik forwarded header trust repair Competitive split passed opencode/kimi-k2.5
0.913
0.147 $0.7472 1m 08s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/nemotron-3-super-free 24-0
3 ties
+0.489 +63% +$21.8757 -68m 56s -4.55
opencode/gemini-3.1-pro 17-8
2 ties
+0.379 +52% +$16.0221 -11m 21s +2.18
opencode/minimax-m2.5-free 24-2
1 ties
+0.255 +30% +$21.8757 -1m 29s -1.31
opencode/gemini-3-flash 24-1
2 ties
+0.255 +30% +$19.4450 -22m 47s -6.94
opencode/gpt-5.4-mini 12-13
2 ties
+0.245 +41% +$20.8151 +18m 16s +5.34
opencode/minimax-m2.5 10-14
3 ties
+0.189 +33% +$21.2344 +7m 49s -3.99
opencode/glm-5.1 7-18
2 ties
+0.123 +22% +$19.9941 -24m 35s +2.82
opencode/gpt-5.4-nano 3-23
1 ties
-0.119 +4% +$21.4541 +12m 31s -0.30
opencode/kimi-k2.5 3-24
0 ties
-0.114 +0% +$20.9634 -1m 00s +0.63
opencode/claude-sonnet-4-6 7-18
2 ties
+0.077 +11% +$10.0351 -2m 26s -1.55
opencode/gpt-5.4 5-21
1 ties
+0.062 +11% +$12.8930 +7m 17s +3.88
opencode/big-pickle 7-18
2 ties
+0.055 +22% +$21.8757 +3m 36s -0.51
opencode/glm-5 6-20
1 ties
+0.047 +11% +$15.4417 +19m 54s +3.30
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests418
Wall time40m 04s
Average task cost$0.7991
Benchmark supportunknown
Catalog blended price$10.0000 / 1M tok
Catalog speed49 tok/s
Intelligence53
Agenticn/a

OpenRouter reference blend for anthropic/claude-opus-4.6-fast is 60 USD per 1M tokens using a 3:1 input:output mix.