ORPT-Bench model detail
Model benchmark profile

opencode/minimax-m2.5

This page uses opencode/minimax-m2.5 as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

minimax low price tier dev-cheap dev-general
Composite
0.481
Correctness-weighted overall standing
Success
56%
Tasks completed successfully
ORPT
18.87
Requests per solved task
Total cost
$0.6413
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/minimax-m2.5

These charts use opencode/minimax-m2.5 as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/minimax-m2.5 enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano 0.789 +0.308 85% +30% 15.17 -3.69 $0.4215 -$0.2197 27m 33s
opencode/kimi-k2.5 0.785 +0.304 89% +33% 14.25 -4.62 $0.9122 +$0.2710 41m 05s
opencode/claude-opus-4-6 0.67 +0.189 89% +33% 14.88 -3.99 $21.8757 +$21.2344 40m 04s
opencode/glm-5 0.623 +0.142 78% +22% 11.57 -7.30 $6.4339 +$5.7927 20m 10s
opencode/big-pickle 0.615 +0.134 67% +11% 15.39 -3.48 $0.0000 -$0.6413 36m 28s
opencode/gpt-5.4 0.609 +0.128 78% +22% 11.00 -7.87 $8.9827 +$8.3414 32m 47s
opencode/claude-sonnet-4-6 0.593 +0.112 78% +22% 16.43 -2.44 $11.8406 +$11.1993 42m 31s
opencode/glm-5.1 0.547 +0.066 67% +11% 12.06 -6.81 $1.8816 +$1.2404 64m 39s
opencode/minimax-m2.5 Baseline 0.481 +0.0 56% +0% 18.87 +0.00 $0.6413 +$0.0000 32m 15s
opencode/gpt-5.4-mini 0.425 -0.056 48% -7% 9.54 -9.33 $1.0606 +$0.4193 21m 48s
opencode/minimax-m2.5-free 0.415 -0.066 59% +4% 16.19 -2.68 $0.0000 -$0.6413 41m 34s
opencode/gemini-3-flash 0.415 -0.066 59% +4% 21.81 +2.95 $2.4307 +$1.7894 62m 52s
opencode/gemini-3.1-pro 0.291 -0.19 37% -19% 12.70 -6.17 $5.8536 +$5.2123 51m 25s
opencode/nemotron-3-super-free 0.181 -0.299 26% -30% 19.43 +0.56 $0.0000 -$0.6413 109m 00s
Task story

Where opencode/minimax-m2.5 separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
SELinux registry volume label repair Clear separation failed opencode/kimi-k2.5
1.0
1.0 $0.0098 33s
RHEL k3s node preparation repair Competitive split failed opencode/gpt-5.4-nano
1.0
1.0 $0.0161 1m 06s
Event status shell summary Competitive split dnf opencode/big-pickle
1.0
1.0 n/a 45s
Kubernetes rollout repair Clear separation failed opencode/gpt-5.4-mini
1.0
1.0 $0.0232 1m 01s
ExternalDNS RFC2136 repair Competitive split failed opencode/kimi-k2.5
0.982
0.982 $0.0144 45s
Docker Compose observability fix Competitive split failed opencode/gpt-5.4-nano
0.975
0.975 $0.0124 37s
Pre-ArgoCD bootstrap sequencing Competitive split failed opencode/gpt-5.4-nano
0.967
0.967 $0.0171 57s
Ansible nginx role completion Competitive split failed opencode/big-pickle
0.963
0.963 $0.0196 51s
Kubernetes OIDC RBAC repair Competitive split failed opencode/gpt-5.4-nano
0.95
0.95 $0.0135 57s
Log audit shell script Competitive split failed opencode/gpt-5.4-nano
0.935
0.935 $0.0139 44s
Workspace runtime access convergence Competitive split failed opencode/gpt-5.4-nano
0.932
0.932 $0.0306 1m 32s
MetalLB ingress address pool repair Competitive split failed opencode/gpt-5.4-nano
0.928
0.928 $0.0183 1m 04s
RHEL NetworkManager bridge VLAN repair Competitive split passed opencode/gpt-5.4-nano
0.951
0.206 $0.0676 2m 55s
Wildcard TLS route coverage Competitive split passed opencode/kimi-k2.5
0.929
0.168 $0.0719 2m 36s
Workspace transplant bundle repair Competitive split passed opencode/big-pickle
0.985
0.137 $0.0244 51s
Build workspace plane convergence Competitive split passed opencode/gpt-5.4-nano
0.942
0.133 $0.0418 1m 34s
nftables router ingress repair Competitive split passed opencode/gpt-5.4-nano
0.98
0.131 $0.0181 1m 00s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.116 $0.0170 1m 11s
Log level rollup shell script Competitive split passed opencode/big-pickle
0.965
0.112 $0.0225 54s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.094 $0.0118 33s
Bootstrap phase validation repair Competitive split passed opencode/kimi-k2.5
0.993
0.093 $0.0404 2m 06s
RHEL edge firewalld router repair Competitive split passed opencode/gpt-5.4-nano
0.953
0.049 $0.0302 1m 48s
CNPG restore manifest repair Competitive split passed opencode/big-pickle
0.964
0.049 $0.0168 1m 03s
Traefik forwarded header trust repair Competitive split passed opencode/kimi-k2.5
0.913
0.037 $0.0286 1m 53s
AppArmor dnsmasq profile repair Competitive split passed opencode/gpt-5.4-nano
0.918
0.021 $0.0232 1m 21s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.02 $0.0193 50s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.015 $0.0187 47s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/gpt-5.4-nano 6-18
3 ties
-0.308 -30% +$0.2197 +4m 42s +3.69
opencode/kimi-k2.5 8-18
1 ties
-0.304 -33% -$0.2710 -8m 50s +4.62
opencode/nemotron-3-super-free 15-1
11 ties
+0.299 +30% +$0.6413 -76m 45s -0.56
opencode/gemini-3.1-pro 15-3
9 ties
+0.19 +19% -$5.2123 -19m 10s +6.17
opencode/claude-opus-4-6 14-10
3 ties
-0.189 -33% -$21.2344 -7m 49s +3.99
opencode/glm-5 13-9
5 ties
-0.142 -22% -$5.7927 +12m 05s +7.30
opencode/big-pickle 6-15
6 ties
-0.134 -11% +$0.6413 -4m 13s +3.48
opencode/gpt-5.4 13-11
3 ties
-0.128 -22% -$8.3414 -32s +7.87
opencode/claude-sonnet-4-6 14-8
5 ties
-0.112 -22% -$11.1993 -10m 16s +2.44
opencode/minimax-m2.5-free 15-7
5 ties
+0.066 -4% +$0.6413 -9m 19s +2.68
opencode/gemini-3-flash 15-5
7 ties
+0.066 -4% -$1.7894 -30m 37s -2.95
opencode/glm-5.1 13-8
6 ties
-0.066 -11% -$1.2404 -32m 24s +6.81
opencode/gpt-5.4-mini 12-8
7 ties
+0.056 +7% -$0.4193 +10m 27s +9.33
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests417
Wall time32m 15s
Average task cost$0.0302
Benchmark supportunknown
Catalog blended price$0.3360 / 1M tok
Catalog speedn/a
Intelligencen/a
Agenticn/a

Primary blended price derived automatically from OpenRouter listing minimax/minimax-m2.5 using a 3:1 input:output blend.