ORPT-Bench model detail
Model benchmark profile

opencode/gemini-3.1-pro

This page uses opencode/gemini-3.1-pro as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

google medium price tier dev-cheap dev-general
Composite
0.291
Correctness-weighted overall standing
Success
37%
Tasks completed successfully
ORPT
12.70
Requests per solved task
Total cost
$5.8536
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/gemini-3.1-pro

These charts use opencode/gemini-3.1-pro as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/gemini-3.1-pro enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano 0.789 +0.498 85% +48% 15.17 +2.47 $0.4215 -$5.4321 27m 33s
opencode/kimi-k2.5 0.785 +0.494 89% +52% 14.25 +1.55 $0.9122 -$4.9414 41m 05s
opencode/claude-opus-4-6 0.67 +0.379 89% +52% 14.88 +2.18 $21.8757 +$16.0221 40m 04s
opencode/glm-5 0.623 +0.332 78% +41% 11.57 -1.13 $6.4339 +$0.5803 20m 10s
opencode/big-pickle 0.615 +0.324 67% +30% 15.39 +2.69 $0.0000 -$5.8536 36m 28s
opencode/gpt-5.4 0.609 +0.318 78% +41% 11.00 -1.70 $8.9827 +$3.1291 32m 47s
opencode/claude-sonnet-4-6 0.593 +0.302 78% +41% 16.43 +3.73 $11.8406 +$5.9869 42m 31s
opencode/glm-5.1 0.547 +0.256 67% +30% 12.06 -0.64 $1.8816 -$3.9720 64m 39s
opencode/minimax-m2.5 0.481 +0.19 56% +19% 18.87 +6.17 $0.6413 -$5.2123 32m 15s
opencode/gpt-5.4-mini 0.425 +0.134 48% +11% 9.54 -3.16 $1.0606 -$4.7930 21m 48s
opencode/minimax-m2.5-free 0.415 +0.124 59% +22% 16.19 +3.49 $0.0000 -$5.8536 41m 34s
opencode/gemini-3-flash 0.415 +0.124 59% +22% 21.81 +9.11 $2.4307 -$3.4229 62m 52s
opencode/gemini-3.1-pro Baseline 0.291 +0.0 37% +0% 12.70 +0.00 $5.8536 +$0.0000 51m 25s
opencode/nemotron-3-super-free 0.181 -0.109 26% -11% 19.43 +6.73 $0.0000 -$5.8536 109m 00s
Task story

Where opencode/gemini-3.1-pro separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
SELinux registry volume label repair Clear separation failed opencode/kimi-k2.5
1.0
1.0 $0.0385 19s
RHEL k3s node preparation repair Competitive split failed opencode/gpt-5.4-nano
1.0
1.0 $0.1624 1m 44s
Event status shell summary Competitive split dnf opencode/big-pickle
1.0
1.0 n/a 45s
Kubernetes rollout repair Clear separation failed opencode/gpt-5.4-mini
1.0
1.0 $0.1657 1m 35s
Bootstrap phase validation repair Competitive split failed opencode/kimi-k2.5
0.993
0.993 $0.3025 2m 19s
ExternalDNS RFC2136 repair Competitive split failed opencode/kimi-k2.5
0.982
0.982 $0.1398 1m 02s
nftables router ingress repair Competitive split failed opencode/gpt-5.4-nano
0.98
0.98 $0.2268 1m 50s
Pre-ArgoCD bootstrap sequencing Competitive split failed opencode/gpt-5.4-nano
0.967
0.967 $0.3456 2m 43s
Log level rollup shell script Competitive split dnf opencode/big-pickle
0.965
0.965 n/a 1m 00s
RHEL edge firewalld router repair Competitive split failed opencode/gpt-5.4-nano
0.953
0.953 $0.1203 1m 09s
RHEL NetworkManager bridge VLAN repair Competitive split failed opencode/gpt-5.4-nano
0.951
0.951 $0.1941 1m 33s
Log audit shell script Competitive split dnf opencode/gpt-5.4-nano
0.935
0.935 n/a 1m 15s
Workspace runtime access convergence Competitive split failed opencode/gpt-5.4-nano
0.932
0.932 $0.5298 3m 12s
Wildcard TLS route coverage Competitive split failed opencode/kimi-k2.5
0.929
0.929 $0.1606 1m 42s
MetalLB ingress address pool repair Competitive split failed opencode/gpt-5.4-nano
0.928
0.928 $0.0619 34s
AppArmor dnsmasq profile repair Competitive split failed opencode/gpt-5.4-nano
0.918
0.918 $0.2495 2m 08s
Traefik forwarded header trust repair Competitive split failed opencode/kimi-k2.5
0.913
0.913 $0.4734 4m 42s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.227 $0.1489 52s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.209 $0.4583 4m 54s
Build workspace plane convergence Competitive split passed opencode/gpt-5.4-nano
0.942
0.208 $0.6600 3m 55s
CNPG restore manifest repair Competitive split passed opencode/big-pickle
0.964
0.199 $0.2791 1m 33s
Workspace transplant bundle repair Competitive split passed opencode/big-pickle
0.985
0.185 $0.1576 1m 00s
Ansible nginx role completion Competitive split passed opencode/big-pickle
0.963
0.185 $0.1680 1m 34s
Docker Compose observability fix Competitive split passed opencode/gpt-5.4-nano
0.975
0.163 $0.3295 3m 57s
Kubernetes OIDC RBAC repair Competitive split passed opencode/gpt-5.4-nano
0.95
0.147 $0.2299 1m 29s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.146 $0.0872 1m 20s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.133 $0.1641 1m 21s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/gpt-5.4-nano 1-22
4 ties
-0.498 -48% +$5.4321 +23m 52s -2.47
opencode/kimi-k2.5 1-24
2 ties
-0.494 -52% +$4.9414 +10m 20s -1.55
opencode/claude-opus-4-6 8-17
2 ties
-0.379 -52% -$16.0221 +11m 21s -2.18
opencode/glm-5 4-19
4 ties
-0.332 -41% -$0.5803 +31m 15s +1.13
opencode/big-pickle 3-17
7 ties
-0.324 -30% +$5.8536 +14m 57s -2.69
opencode/gpt-5.4 5-17
5 ties
-0.318 -41% -$3.1291 +18m 38s +1.70
opencode/claude-sonnet-4-6 8-15
4 ties
-0.302 -41% -$5.9869 +8m 54s -3.73
opencode/glm-5.1 4-16
7 ties
-0.256 -30% +$3.9720 -13m 14s +0.64
opencode/minimax-m2.5 3-15
9 ties
-0.19 -19% +$5.2123 +19m 10s -6.17
opencode/gpt-5.4-mini 3-12
12 ties
-0.134 -11% +$4.7930 +29m 37s +3.16
opencode/minimax-m2.5-free 10-7
10 ties
-0.124 -22% +$5.8536 +9m 51s -3.49
opencode/gemini-3-flash 10-9
8 ties
-0.124 -22% +$3.4229 -11m 27s -9.11
opencode/nemotron-3-super-free 10-1
16 ties
+0.109 +11% +$5.8536 -57m 35s -6.73
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests307
Wall time51m 25s
Average task cost$0.2683
Benchmark supportunknown
Catalog blended price$4.5000 / 1M tok
Catalog speed127 tok/s
Intelligence57
Agenticn/a

OpenRouter reference blend for google/gemini-3.1-pro-preview is 4.5 USD per 1M tokens using a 3:1 input:output mix.