ORPT-Bench model detail
Model benchmark profile

opencode/minimax-m2.5-free

This page uses opencode/minimax-m2.5-free as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

minimax free price tier dev-cheap dev-smoke
Composite
0.415
Correctness-weighted overall standing
Success
59%
Tasks completed successfully
ORPT
16.19
Requests per solved task
Total cost
$0.0000
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/minimax-m2.5-free

These charts use opencode/minimax-m2.5-free as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/minimax-m2.5-free enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano 0.789 +0.375 85% +26% 15.17 -1.01 $0.4215 +$0.4215 27m 33s
opencode/kimi-k2.5 0.785 +0.37 89% +30% 14.25 -1.94 $0.9122 +$0.9122 41m 05s
opencode/claude-opus-4-6 0.67 +0.255 89% +30% 14.88 -1.31 $21.8757 +$21.8757 40m 04s
opencode/glm-5 0.623 +0.208 78% +19% 11.57 -4.62 $6.4339 +$6.4339 20m 10s
opencode/big-pickle 0.615 +0.2 67% +7% 15.39 -0.80 $0.0000 +$0.0000 36m 28s
opencode/gpt-5.4 0.609 +0.194 78% +19% 11.00 -5.19 $8.9827 +$8.9827 32m 47s
opencode/claude-sonnet-4-6 0.593 +0.178 78% +19% 16.43 +0.24 $11.8406 +$11.8406 42m 31s
opencode/glm-5.1 0.547 +0.132 67% +7% 12.06 -4.13 $1.8816 +$1.8816 64m 39s
opencode/minimax-m2.5 0.481 +0.066 56% -4% 18.87 +2.68 $0.6413 +$0.6413 32m 15s
opencode/gpt-5.4-mini 0.425 +0.01 48% -11% 9.54 -6.65 $1.0606 +$1.0606 21m 48s
opencode/minimax-m2.5-free Baseline 0.415 +0.0 59% +0% 16.19 +0.00 $0.0000 +$0.0000 41m 34s
opencode/gemini-3-flash 0.415 +0.0 59% +0% 21.81 +5.63 $2.4307 +$2.4307 62m 52s
opencode/gemini-3.1-pro 0.291 -0.124 37% -22% 12.70 -3.49 $5.8536 +$5.8536 51m 25s
opencode/nemotron-3-super-free 0.181 -0.233 26% -33% 19.43 +3.24 $0.0000 +$0.0000 109m 00s
Task story

Where opencode/minimax-m2.5-free separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
RHEL k3s node preparation repair Competitive split failed opencode/gpt-5.4-nano
1.0
1.0 n/a 1m 17s
Event status shell summary Competitive split dnf opencode/big-pickle
1.0
1.0 n/a 45s
Bootstrap phase validation repair Competitive split failed opencode/kimi-k2.5
0.993
0.993 n/a 1m 00s
ExternalDNS RFC2136 repair Competitive split failed opencode/kimi-k2.5
0.982
0.982 n/a 1m 01s
nftables router ingress repair Competitive split failed opencode/gpt-5.4-nano
0.98
0.98 n/a 1m 44s
Docker Compose observability fix Competitive split dnf opencode/gpt-5.4-nano
0.975
0.975 n/a 5m 00s
RHEL edge firewalld router repair Competitive split failed opencode/gpt-5.4-nano
0.953
0.953 n/a 37s
RHEL NetworkManager bridge VLAN repair Competitive split failed opencode/gpt-5.4-nano
0.951
0.951 n/a 53s
Log audit shell script Competitive split dnf opencode/gpt-5.4-nano
0.935
0.935 n/a 1m 15s
Wildcard TLS route coverage Competitive split dnf opencode/kimi-k2.5
0.929
0.929 n/a 5m 00s
Traefik forwarded header trust repair Competitive split failed opencode/kimi-k2.5
0.913
0.913 n/a 1m 10s
SELinux registry volume label repair Clear separation passed opencode/kimi-k2.5
1.0
0.3 n/a 1m 13s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.3 n/a 45s
Kubernetes rollout repair Clear separation passed opencode/gpt-5.4-mini
1.0
0.3 n/a 1m 35s
Workspace transplant bundle repair Competitive split passed opencode/big-pickle
0.985
0.285 n/a 51s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.278 n/a 1m 20s
Pre-ArgoCD bootstrap sequencing Competitive split passed opencode/gpt-5.4-nano
0.967
0.267 n/a 1m 32s
Log level rollup shell script Competitive split passed opencode/big-pickle
0.965
0.265 n/a 49s
CNPG restore manifest repair Competitive split passed opencode/big-pickle
0.964
0.264 n/a 1m 27s
Ansible nginx role completion Competitive split passed opencode/big-pickle
0.963
0.263 n/a 1m 11s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.254 n/a 1m 50s
Kubernetes OIDC RBAC repair Competitive split passed opencode/gpt-5.4-nano
0.95
0.25 n/a 2m 13s
Build workspace plane convergence Competitive split passed opencode/gpt-5.4-nano
0.942
0.242 n/a 1m 14s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.241 n/a 1m 22s
Workspace runtime access convergence Competitive split passed opencode/gpt-5.4-nano
0.932
0.232 n/a 1m 20s
MetalLB ingress address pool repair Competitive split passed opencode/gpt-5.4-nano
0.928
0.228 n/a 1m 48s
AppArmor dnsmasq profile repair Competitive split passed opencode/gpt-5.4-nano
0.918
0.218 n/a 1m 21s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/gpt-5.4-nano 2-23
2 ties
-0.375 -26% -$0.4215 +14m 01s +1.01
opencode/kimi-k2.5 1-24
2 ties
-0.37 -30% -$0.9122 +29s +1.94
opencode/claude-opus-4-6 2-24
1 ties
-0.255 -30% -$21.8757 +1m 29s +1.31
opencode/nemotron-3-super-free 10-1
16 ties
+0.233 +33% +$0.0000 -67m 27s -3.24
opencode/glm-5 3-21
3 ties
-0.208 -19% -$6.4339 +21m 23s +4.62
opencode/big-pickle 4-18
5 ties
-0.2 -7% +$0.0000 +5m 05s +0.80
opencode/gpt-5.4 1-21
5 ties
-0.194 -19% -$8.9827 +8m 46s +5.19
opencode/claude-sonnet-4-6 4-21
2 ties
-0.178 -19% -$11.8406 -57s -0.24
opencode/glm-5.1 3-18
6 ties
-0.132 -7% -$1.8816 -23m 06s +4.13
opencode/gemini-3.1-pro 7-10
10 ties
+0.124 +22% -$5.8536 -9m 51s +3.49
opencode/minimax-m2.5 7-15
5 ties
-0.066 +4% -$0.6413 +9m 19s -2.68
opencode/gpt-5.4-mini 6-13
8 ties
-0.01 +11% -$1.0606 +19m 45s +6.65
opencode/gemini-3-flash 5-5
17 ties
+0.0 +0% -$2.4307 -21m 18s -5.63
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests475
Wall time41m 34s
Average task costn/a
Benchmark supportunsupported
Catalog blended price$0.0000 / 1M tok
Catalog speedn/a
Intelligencen/a
Agenticn/a

Primary blended price derived automatically from OpenRouter listing minimax/minimax-m2.5:free using a 3:1 input:output blend. Reference price uses minimax/minimax-m2.5 at 0.336 USD per 1M tokens from the same OpenRouter family.

Observed to trigger external_directory permission prompts in headless runs.