ORPT-Bench model detail
Model benchmark profile

opencode/kimi-k2.5

This page uses opencode/kimi-k2.5 as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

moonshot low price tier standard standard
Composite
0.785
Correctness-weighted overall standing
Success
89%
Tasks completed successfully
ORPT
14.25
Requests per solved task
Total cost
$0.9122
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/kimi-k2.5

These charts use opencode/kimi-k2.5 as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/kimi-k2.5 enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano 0.789 +0.005 85% -4% 15.17 +0.92 $0.4215 -$0.4907 27m 33s
opencode/kimi-k2.5 Baseline 0.785 +0.0 89% +0% 14.25 +0.00 $0.9122 +$0.0000 41m 05s
opencode/claude-opus-4-6 0.67 -0.114 89% +0% 14.88 +0.63 $21.8757 +$20.9634 40m 04s
opencode/glm-5 0.623 -0.162 78% -11% 11.57 -2.68 $6.4339 +$5.5217 20m 10s
opencode/big-pickle 0.615 -0.169 67% -22% 15.39 +1.14 $0.0000 -$0.9122 36m 28s
opencode/gpt-5.4 0.609 -0.176 78% -11% 11.00 -3.25 $8.9827 +$8.0705 32m 47s
opencode/claude-sonnet-4-6 0.593 -0.192 78% -11% 16.43 +2.18 $11.8406 +$10.9283 42m 31s
opencode/glm-5.1 0.547 -0.238 67% -22% 12.06 -2.19 $1.8816 +$0.9694 64m 39s
opencode/minimax-m2.5 0.481 -0.304 56% -33% 18.87 +4.62 $0.6413 -$0.2710 32m 15s
opencode/gpt-5.4-mini 0.425 -0.36 48% -41% 9.54 -4.71 $1.0606 +$0.1484 21m 48s
opencode/minimax-m2.5-free 0.415 -0.37 59% -30% 16.19 +1.94 $0.0000 -$0.9122 41m 34s
opencode/gemini-3-flash 0.415 -0.37 59% -30% 21.81 +7.56 $2.4307 +$1.5184 62m 52s
opencode/gemini-3.1-pro 0.291 -0.494 37% -52% 12.70 -1.55 $5.8536 +$4.9414 51m 25s
opencode/nemotron-3-super-free 0.181 -0.603 26% -63% 19.43 +5.18 $0.0000 -$0.9122 109m 00s
Task story

Where opencode/kimi-k2.5 separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
Workspace transplant bundle repair Competitive split dnf opencode/big-pickle
0.985
0.985 n/a 5m 00s
RHEL edge firewalld router repair Competitive split failed opencode/gpt-5.4-nano
0.953
0.953 $0.0198 43s
Log audit shell script Competitive split dnf opencode/gpt-5.4-nano
0.935
0.935 n/a 1m 15s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.208 $0.0306 1m 04s
RHEL k3s node preparation repair Competitive split passed opencode/gpt-5.4-nano
1.0
0.202 $0.0612 2m 26s
nftables router ingress repair Competitive split passed opencode/gpt-5.4-nano
0.98
0.187 $0.0369 1m 21s
Kubernetes rollout repair Clear separation passed opencode/gpt-5.4-mini
1.0
0.177 $0.0495 2m 12s
Event status shell summary Competitive split passed opencode/big-pickle
1.0
0.14 $0.0205 37s
Docker Compose observability fix Competitive split passed opencode/gpt-5.4-nano
0.975
0.139 $0.0897 5m 00s
Build workspace plane convergence Competitive split passed opencode/gpt-5.4-nano
0.942
0.124 $0.0519 1m 29s
CNPG restore manifest repair Competitive split passed opencode/big-pickle
0.964
0.09 $0.0261 54s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.08 $0.0328 50s
RHEL NetworkManager bridge VLAN repair Competitive split passed opencode/gpt-5.4-nano
0.951
0.08 $0.0248 50s
MetalLB ingress address pool repair Competitive split passed opencode/gpt-5.4-nano
0.928
0.078 $0.0428 1m 38s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.078 $0.0347 1m 09s
Pre-ArgoCD bootstrap sequencing Competitive split passed opencode/gpt-5.4-nano
0.967
0.073 $0.0420 2m 59s
Log level rollup shell script Competitive split passed opencode/big-pickle
0.965
0.072 $0.0223 36s
Kubernetes OIDC RBAC repair Competitive split passed opencode/gpt-5.4-nano
0.95
0.066 $0.0380 1m 12s
AppArmor dnsmasq profile repair Competitive split passed opencode/gpt-5.4-nano
0.918
0.058 $0.0377 1m 37s
Workspace runtime access convergence Competitive split passed opencode/gpt-5.4-nano
0.932
0.047 $0.0633 1m 42s
Ansible nginx role completion Competitive split passed opencode/big-pickle
0.963
0.045 $0.0200 52s
SELinux registry volume label repair Clear separation passed opencode/kimi-k2.5
1.0
0.0 $0.0254 42s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.0 $0.0152 41s
Bootstrap phase validation repair Competitive split passed opencode/kimi-k2.5
0.993
0.0 $0.0380 1m 30s
ExternalDNS RFC2136 repair Competitive split passed opencode/kimi-k2.5
0.982
0.0 $0.0299 46s
Traefik forwarded header trust repair Competitive split passed opencode/kimi-k2.5
0.913
0.0 $0.0331 1m 01s
Wildcard TLS route coverage Competitive split passed opencode/kimi-k2.5
0.929
0.0 $0.0262 1m 00s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/nemotron-3-super-free 24-1
2 ties
+0.603 +63% +$0.9122 -67m 56s -5.18
opencode/gemini-3.1-pro 24-1
2 ties
+0.494 +52% -$4.9414 -10m 20s +1.55
opencode/minimax-m2.5-free 24-1
2 ties
+0.37 +30% +$0.9122 -29s -1.94
opencode/gemini-3-flash 24-2
1 ties
+0.37 +30% -$1.5184 -21m 47s -7.56
opencode/gpt-5.4-mini 19-7
1 ties
+0.36 +41% -$0.1484 +19m 16s +4.71
opencode/minimax-m2.5 18-8
1 ties
+0.304 +33% +$0.2710 +8m 50s -4.62
opencode/glm-5.1 23-2
2 ties
+0.238 +22% -$0.9694 -23m 35s +2.19
opencode/claude-sonnet-4-6 24-3
0 ties
+0.192 +11% -$10.9283 -1m 26s -2.18
opencode/gpt-5.4 24-2
1 ties
+0.176 +11% -$8.0705 +8m 17s +3.25
opencode/big-pickle 12-14
1 ties
+0.169 +22% +$0.9122 +4m 36s -1.14
opencode/glm-5 23-4
0 ties
+0.162 +11% -$5.5217 +20m 54s +2.68
opencode/claude-opus-4-6 24-3
0 ties
+0.114 +0% -$20.9634 +1m 00s -0.63
opencode/gpt-5.4-nano 8-19
0 ties
-0.005 +4% +$0.4907 +13m 32s -0.92
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests375
Wall time41m 05s
Average task cost$0.0372
Benchmark supportunknown
Catalog blended price$0.7170 / 1M tok
Catalog speedn/a
Intelligencen/a
Agenticn/a

Primary blended price derived automatically from OpenRouter listing moonshotai/kimi-k2.5 using a 3:1 input:output blend.