ORPT-Bench model detail
Model benchmark profile

opencode/gemini-3-flash

This page uses opencode/gemini-3-flash as the comparison baseline. Every chart and table below is intended to answer the same question: where this model leads, where it lags, and what it costs in quality, time, and request pressure.

google low price tier dev-cheap dev-general
Composite
0.415
Correctness-weighted overall standing
Success
59%
Tasks completed successfully
ORPT
21.81
Requests per solved task
Total cost
$2.4307
Observed benchmark spend
Baseline comparison

How the field moves relative to opencode/gemini-3-flash

These charts use opencode/gemini-3-flash as zero. Positive bars mean other models are above the baseline on that metric; negative bars mean they trail it.

Composite delta vs baseline

Success delta vs baseline

Cost delta vs baseline

Wall time delta vs baseline

Decision table

Field comparison against the baseline

Use this to decide whether another model beats opencode/gemini-3-flash enough to justify the change.

Model Composite Delta vs baseline Success Success delta ORPT ORPT delta Cost Cost delta Wall time
opencode/gpt-5.4-nano 0.789 +0.375 85% +26% 15.17 -6.64 $0.4215 -$2.0091 27m 33s
opencode/kimi-k2.5 0.785 +0.37 89% +30% 14.25 -7.56 $0.9122 -$1.5184 41m 05s
opencode/claude-opus-4-6 0.67 +0.255 89% +30% 14.88 -6.94 $21.8757 +$19.4450 40m 04s
opencode/glm-5 0.623 +0.208 78% +19% 11.57 -10.24 $6.4339 +$4.0033 20m 10s
opencode/big-pickle 0.615 +0.2 67% +7% 15.39 -6.42 $0.0000 -$2.4307 36m 28s
opencode/gpt-5.4 0.609 +0.194 78% +19% 11.00 -10.81 $8.9827 +$6.5520 32m 47s
opencode/claude-sonnet-4-6 0.593 +0.178 78% +19% 16.43 -5.38 $11.8406 +$9.4099 42m 31s
opencode/glm-5.1 0.547 +0.132 67% +7% 12.06 -9.76 $1.8816 -$0.5490 64m 39s
opencode/minimax-m2.5 0.481 +0.066 56% -4% 18.87 -2.95 $0.6413 -$1.7894 32m 15s
opencode/gpt-5.4-mini 0.425 +0.01 48% -11% 9.54 -12.27 $1.0606 -$1.3701 21m 48s
opencode/minimax-m2.5-free 0.415 +0.0 59% +0% 16.19 -5.63 $0.0000 -$2.4307 41m 34s
opencode/gemini-3-flash Baseline 0.415 +0.0 59% +0% 21.81 +0.00 $2.4307 +$0.0000 62m 52s
opencode/gemini-3.1-pro 0.291 -0.124 37% -22% 12.70 -9.11 $5.8536 +$3.4229 51m 25s
opencode/nemotron-3-super-free 0.181 -0.233 26% -33% 19.43 -2.38 $0.0000 -$2.4307 109m 00s
Task story

Where opencode/gemini-3-flash separates

This table puts the most revealing tasks first: unsolved tasks, single-solver tasks, and tasks where the baseline trails the winner by a meaningful margin.

Task Field read Baseline result Winner Gap to winner Baseline cost Baseline time
RHEL k3s node preparation repair Competitive split failed opencode/gpt-5.4-nano
1.0
1.0 $0.0468 49s
Event status shell summary Competitive split dnf opencode/big-pickle
1.0
1.0 n/a 45s
Kubernetes rollout repair Clear separation failed opencode/gpt-5.4-mini
1.0
1.0 $0.0523 1m 22s
nftables router ingress repair Competitive split failed opencode/gpt-5.4-nano
0.98
0.98 $0.0821 1m 41s
Docker Compose observability fix Competitive split failed opencode/gpt-5.4-nano
0.975
0.975 $0.0786 2m 26s
Pre-ArgoCD bootstrap sequencing Competitive split failed opencode/gpt-5.4-nano
0.967
0.967 $0.1001 2m 03s
Log level rollup shell script Competitive split dnf opencode/big-pickle
0.965
0.965 n/a 1m 00s
Ansible nginx role completion Competitive split dnf opencode/big-pickle
0.963
0.963 n/a 5m 00s
RHEL NetworkManager bridge VLAN repair Competitive split failed opencode/gpt-5.4-nano
0.951
0.951 $0.1243 2m 35s
Build workspace plane convergence Competitive split failed opencode/gpt-5.4-nano
0.942
0.942 n/a 5m 01s
Log audit shell script Competitive split dnf opencode/gpt-5.4-nano
0.935
0.935 n/a 1m 15s
SELinux registry volume label repair Clear separation passed opencode/kimi-k2.5
1.0
0.3 $0.0799 1m 15s
K3s registry mirror trust repair Competitive split passed opencode/big-pickle
1.0
0.3 $0.2351 3m 40s
Bootstrap phase validation repair Competitive split passed opencode/kimi-k2.5
0.993
0.293 $0.0782 3m 18s
Workspace transplant bundle repair Competitive split passed opencode/big-pickle
0.985
0.285 $0.1858 4m 10s
ExternalDNS RFC2136 repair Competitive split passed opencode/kimi-k2.5
0.982
0.282 $0.0818 1m 48s
Terraform static site repair Competitive split passed opencode/kimi-k2.5
0.978
0.278 $0.0568 1m 38s
CNPG restore manifest repair Competitive split passed opencode/big-pickle
0.964
0.264 $0.0716 1m 09s
MCP OpenBao contract repair Competitive split passed opencode/big-pickle
0.954
0.254 $0.1430 2m 20s
RHEL edge firewalld router repair Competitive split passed opencode/gpt-5.4-nano
0.953
0.253 $0.1025 1m 19s
Kubernetes OIDC RBAC repair Competitive split passed opencode/gpt-5.4-nano
0.95
0.25 $0.1416 2m 42s
GitOps workspace render validation Competitive split passed opencode/big-pickle
0.941
0.241 $0.0882 1m 28s
Workspace runtime access convergence Competitive split passed opencode/gpt-5.4-nano
0.932
0.232 $0.1889 2m 41s
Wildcard TLS route coverage Competitive split passed opencode/kimi-k2.5
0.929
0.229 $0.1207 4m 06s
MetalLB ingress address pool repair Competitive split passed opencode/gpt-5.4-nano
0.928
0.228 $0.1348 2m 14s
AppArmor dnsmasq profile repair Competitive split passed opencode/gpt-5.4-nano
0.918
0.218 $0.1621 3m 29s
Traefik forwarded header trust repair Competitive split passed opencode/kimi-k2.5
0.913
0.213 $0.0754 1m 38s
Head to head

Direct matchups

Pairwise task wins and top-line deltas show whether a challenger truly beats the baseline or just looks cheaper or faster in isolation.

Challenger Task record Composite edge Success edge Cost edge Time edge ORPT edge
opencode/gpt-5.4-nano 3-23
1 ties
-0.375 -26% +$2.0091 +35m 19s +6.64
opencode/kimi-k2.5 2-24
1 ties
-0.37 -30% +$1.5184 +21m 47s +7.56
opencode/claude-opus-4-6 1-24
2 ties
-0.255 -30% -$19.4450 +22m 47s +6.94
opencode/nemotron-3-super-free 12-3
12 ties
+0.233 +33% +$2.4307 -46m 09s +2.38
opencode/glm-5 3-21
3 ties
-0.208 -19% -$4.0033 +42m 41s +10.24
opencode/big-pickle 5-18
4 ties
-0.2 -7% +$2.4307 +26m 23s +6.42
opencode/gpt-5.4 2-21
4 ties
-0.194 -19% -$6.5520 +30m 04s +10.81
opencode/claude-sonnet-4-6 4-21
2 ties
-0.178 -19% -$9.4099 +20m 21s +5.38
opencode/glm-5.1 4-18
5 ties
-0.132 -7% +$0.5490 -1m 48s +9.76
opencode/gemini-3.1-pro 9-10
8 ties
+0.124 +22% -$3.4229 +11m 27s +9.11
opencode/minimax-m2.5 5-15
7 ties
-0.066 +4% +$1.7894 +30m 37s +2.95
opencode/gpt-5.4-mini 10-13
4 ties
-0.01 +11% +$1.3701 +41m 03s +12.27
opencode/minimax-m2.5-free 5-5
17 ties
+0.0 +0% +$2.4307 +21m 18s +5.63
Model context

Benchmark and catalog detail

The benchmark result only matters in context: this section pairs the observed benchmark outcome with the catalog metadata and operating characteristics behind it.

Requests508
Wall time62m 52s
Average task cost$0.1217
Benchmark supportlimited
Catalog blended price$1.1000 / 1M tok
Catalog speed179 tok/s
Intelligence46
Agenticn/a

OpenRouter reference blend for google/gemini-3-flash-preview is 1.125 USD per 1M tokens using a 3:1 input:output mix.

Observed to loop and hit the benchmark process deadline on the task-05 scripting smoke run, so it should not be in the default headless dev matrix.