ORPT-Bench model detail
Model benchmark profile

opencode/claude-sonnet-4-6

This page uses opencode/claude-sonnet-4-6 as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

anthropic medium price tier standard balanced-general
Composite
0.593
Correctness-weighted overall standing
Success
78%
Tasks completed successfully
ORPT
16.43
Requests per solved task
Total cost
$11.8406
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/claude-sonnet-4-6

These charts use opencode/claude-sonnet-4-6 as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/claude-sonnet-4-6 enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano 0.789 +0.197 85% +7% 15.17 -1.25 $0.4215 -$11.4190 27m 33s
opencode/kimi-k2.5 0.785 +0.192 89% +11% 14.25 -2.18 $0.9122 -$10.9283 41m 05s
opencode/claude-opus-4-6 0.67 +0.077 89% +11% 14.88 -1.55 $21.8757 +$10.0351 40m 04s
opencode/glm-5 0.623 +0.03 78% +0% 11.57 -4.86 $6.4339 -$5.4066 20m 10s
opencode/big-pickle 0.615 +0.022 67% -11% 15.39 -1.04 $0.0000 -$11.8406 36m 28s
opencode/gpt-5.4 0.609 +0.016 78% +0% 11.00 -5.43 $8.9827 -$2.8579 32m 47s
opencode/claude-sonnet-4-6 Baseline 0.593 +0.0 78% +0% 16.43 +0.00 $11.8406 +$0.0000 42m 31s
opencode/glm-5.1 0.547 -0.046 67% -11% 12.06 -4.37 $1.8816 -$9.9589 64m 39s
opencode/minimax-m2.5 0.481 -0.112 56% -22% 18.87 +2.44 $0.6413 -$11.1993 32m 15s
opencode/gpt-5.4-mini 0.425 -0.168 48% -30% 9.54 -6.89 $1.0606 -$10.7800 21m 48s
opencode/minimax-m2.5-free 0.415 -0.178 59% -19% 16.19 -0.24 $0.0000 -$11.8406 41m 34s
opencode/gemini-3-flash 0.415 -0.178 59% -19% 21.81 +5.38 $2.4307 -$9.4099 62m 52s
opencode/gemini-3.1-pro 0.291 -0.302 37% -41% 12.70 -3.73 $5.8536 -$5.9869 51m 25s
opencode/nemotron-3-super-free 0.181 -0.411 26% -52% 19.43 +3.00 $0.0000 -$11.8406 109m 00s
Task story

Where opencode/claude-sonnet-4-6 separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
SELinux registry volume label repair Clear separation failed opencode/kimi-k2.5
1.0
1.0 $0.4442 54s
Kubernetes rollout repair Clear separation failed opencode/gpt-5.4-mini
1.0
1.0 $0.3864 42s
ExternalDNS RFC2136 repair Competitive split failed opencode/kimi-k2.5
0.982
0.982 $0.2057 5m 01s
Docker Compose observability fix Competitive split failed opencode/gpt-5.4-nano
0.975
0.975 $0.4126 53s
CNPG restore manifest repair Competitive split dnf opencode/big-pickle
0.964
0.964 $0.2644 5m 00s
Workspace runtime access convergence Competitive split failed opencode/gpt-5.4-nano
0.932
0.932 $0.1763 5m 01s
RHEL k3s node preparation repair Competitive split passed opencode/gpt-5.4-nano
1.0
0.252 $0.6228 1m 31s
Event status shell summary Competitive split passed opencode/big-pickle
1.0
0.239 $0.3380 33s
nftables router ingress repair Competitive split passed opencode/gpt-5.4-nano
0.98
0.238 $0.4760 1m 06s
Workspace transplant bundle repair Competitive split passed opencode/big-pickle
0.985
0.237 $0.4336 1m 24s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.234 $0.3419 26s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.207 $0.4026 34s
Bootstrap phase validation repair Competitive split passed opencode/kimi-k2.5
0.993
0.206 $0.5790 1m 24s
RHEL NetworkManager bridge VLAN repair Competitive split passed opencode/gpt-5.4-nano
0.951
0.205 $0.5330 57s
Ansible nginx role completion Competitive split passed opencode/big-pickle
0.963
0.203 $0.3992 42s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.202 $0.5677 3m 09s
Build workspace plane convergence Competitive split passed opencode/gpt-5.4-nano
0.942
0.198 $0.5629 2m 07s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.195 $0.5186 56s
Log level rollup shell script Competitive split passed opencode/big-pickle
0.965
0.187 $0.3619 32s
Kubernetes OIDC RBAC repair Competitive split passed opencode/gpt-5.4-nano
0.95
0.181 $0.5007 1m 15s
Pre-ArgoCD bootstrap sequencing Competitive split passed opencode/gpt-5.4-nano
0.967
0.175 $0.4631 1m 16s
Wildcard TLS route coverage Competitive split passed opencode/kimi-k2.5
0.929
0.172 $0.4915 1m 08s
Log audit shell script Competitive split passed opencode/gpt-5.4-nano
0.935
0.169 $0.3782 45s
MetalLB ingress address pool repair Competitive split passed opencode/gpt-5.4-nano
0.928
0.169 $0.5138 1m 20s
RHEL edge firewalld router repair Competitive split passed opencode/gpt-5.4-nano
0.953
0.168 $0.4893 1m 27s
AppArmor dnsmasq profile repair Competitive split passed opencode/gpt-5.4-nano
0.918
0.164 $0.5205 1m 18s
Traefik forwarded header trust repair Competitive split passed opencode/kimi-k2.5
0.913
0.141 $0.4565 1m 10s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/nemotron-3-super-free 21-0
6 ties
+0.411 +52% +$11.8406 -66m 30s -3.00
opencode/gemini-3.1-pro 15-8
4 ties
+0.302 +41% +$5.9869 -8m 54s +3.73
opencode/gpt-5.4-nano 1-23
3 ties
-0.197 -7% +$11.4190 +14m 58s +1.25
opencode/kimi-k2.5 3-24
0 ties
-0.192 -11% +$10.9283 +1m 26s +2.18
opencode/minimax-m2.5-free 21-4
2 ties
+0.178 +19% +$11.8406 +57s +0.24
opencode/gemini-3-flash 21-4
2 ties
+0.178 +19% +$9.4099 -20m 21s -5.38
opencode/gpt-5.4-mini 10-13
4 ties
+0.168 +30% +$10.7800 +20m 42s +6.89
opencode/minimax-m2.5 8-14
5 ties
+0.112 +22% +$11.1993 +10m 16s -2.44
opencode/claude-opus-4-6 18-7
2 ties
-0.077 -11% -$10.0351 +2m 26s +1.55
opencode/glm-5.1 8-18
1 ties
+0.046 +11% +$9.9589 -22m 09s +4.37
opencode/glm-5 7-18
2 ties
-0.03 +0% +$5.4066 +22m 20s +4.86
opencode/big-pickle 5-18
4 ties
-0.022 +11% +$11.8406 +6m 02s +1.04
opencode/gpt-5.4 7-19
1 ties
-0.016 +0% +$2.8579 +9m 43s +5.43
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests403
Wall time42m 31s
Average task cost$0.4739
Benchmark supportunknown
Catalog blended price$6.0000 / 1M tok
Catalog speed67 tok/s
Intelligence52
Agenticn/a

OpenRouter reference blend for anthropic/claude-opus-4.6-fast is 60 USD per 1M tokens using a 3:1 input:output mix.