智算多多
官方邮箱:service@zsdodo.com

公司地址:北京市丰台区南四环西路188号总部基地三区国联股份数字经济总部


京公网安备11010602202532号 No. Not at GA. This is the single most important thing to get right, because half the "Gemini 3.5 Pro benchmarks" content circulating right now is projecting Flash's numbers onto a model whose model card does not yet exist.
Here's the verifiable timeline:
As of this writing (June 5), nothing has changed that status. There is no spec sheet, no benchmark card, no pricing tier, and no general API access for Gemini 3.5 Pro. If you see a "94% on benchmark X" claim for 3.5 Pro right now, it is either extrapolated from Gemini 3.1 Pro / 3.5 Flash or invented. Treat it as such.
So the useful question isn't "how good is 3.5 Pro" — nobody outside Google can answer that yet. It's "what's the evidence base, and what should I watch for at GA."
Flash is the reason 3.5 Pro is interesting. The whole point of a "Pro" tier is that it sits above Flash — so Flash's GA numbers set the floor for what Pro has to clear.
Gemini 3.5 Flash shipped May 19, 2026 (gemini-3.5-flash), and the headline was genuinely unusual: a Flash-tier model leading the previous Pro tier on agentic benchmarks. Verified specs from Google's model card and independent coverage:
| Spec / Benchmark | Gemini 3.5 Flash | Source basis |
|---|---|---|
| Context window | 1M tokens | Google model card |
| Max output | 64K tokens | Google model card |
| Modalities | Text, image, audio, video, PDF in | Google model card |
| Terminal-Bench 2.1 | 76.2% | Google / independent |
| MCP Atlas | 83.6% | Google / independent |
| GDPval-AA | 1656 Elo | |
| CharXiv Reasoning | 84.2% | |
| Pricing (in / out) | $1.50 / $9.00 per 1M | Google API pricing |
| Cached input | $0.15 per 1M (90% off) | Google API pricing |
The thing to internalize: Flash already beats Gemini 3.1 Pro (Google's February 2026 flagship) on Terminal-Bench 2.1, MCP Atlas, and GDPval-AA. That's the bar 3.5 Pro is built to exceed. If Pro merely matched Flash on agentic coding, the tier wouldn't justify itself — so Google is effectively committing to a model that pushes past 76% Terminal-Bench and 83% MCP Atlas, with Deep Think reasoning layered on top.
Where Flash still trails: pure abstract reasoning. Flash sits around 72.1% on ARC-AGI-2 versus Gemini 3.1 Pro's verified 77.1%. That reasoning gap is exactly the territory a Deep Think-equipped Pro model is designed to reclaim.
Stripping out the speculation, here's what Google itself has stated about Gemini 3.5 Pro — framed as targets and positioning, not measured results:
gemini-3.5-pro (visible in Vertex preview).Every one of those is a vendor-stated target. None is an independently reproduced benchmark, because the model card doesn't exist yet. I'm flagging that explicitly because the whole value of this piece is not pretending otherwise.
For a sense of the lineage these targets sit on, Gemini 3.1 Pro (the current shipped Pro, GA February 2026) is the honest baseline:
| Gemini 3.1 Pro (verified, shipped) | Value |
|---|---|
| Context window | 1M tokens |
| ARC-AGI-2 | 77.1% |
| GPQA Diamond | 94.3% |
| SWE-bench Verified | 80.6% |
| MMMU-Pro | 80.5% |
| Pricing (in / out) | $2 / $12 per 1M (tiered: $4 / $18 above 200K) |
3.5 Pro is the model that's supposed to beat that line while adding Deep Think and (targeted) 2M context. Until GA, that's the most defensible way to think about it.
This is the table everyone wants, so here it is with a hard rule: Gemini 3.5 Pro's column is marked "preview — TBD at GA," not filled with guesses. The other models' numbers are verified from their GA releases and independent trackers.
| Benchmark | Gemini 3.5 Pro | Gemini 3.5 Flash | Claude Opus 4.8 | GPT-5.5 | Grok 4.3 |
|---|---|---|---|---|---|
| ARC-AGI-2 | TBD at GA | 72.1% | 75.8% | 85.0% | not published |
| SWE-bench Pro | TBD at GA | — | 69.2% | 58.6% | — |
| Terminal-Bench 2.1 | TBD at GA | 76.2% | 74.6% | 78.2% | — |
| MCP Atlas | TBD at GA | 83.6% | — | — | — |
| GPQA Diamond | TBD at GA | — | 93.6% | 93.5% | — |
| GDPval-AA (Elo) | TBD at GA | 1656 | 1890 | ~1769 | — |
| Context window | 2M (target) | 1M | 1M | 922K | 1M |
Reading this honestly:
The interesting strategic question at GA: does 3.5 Pro chase GPT-5.5's ARC-AGI-2 crown via Deep Think, or does Google double down on the agentic-coding + multimodal + 2M-context lane where it's already differentiated? My read is the latter — Flash's numbers tell you where this generation's engineering went.
Unconfirmed. No pricing tier has been published for 3.5 Pro. The only honest statement: it will be announced at GA.
That said, Google's Pro-tier pricing has been remarkably stable, so the shape is predictable even if the number isn't:
| Model | Input / 1M | Output / 1M | Notes |
|---|---|---|---|
| Gemini 3.5 Flash | $1.50 | $9.00 | Cached $0.15; verified |
| Gemini 3.1 Pro | $2.00 | $12.00 | Tiered to $4/$18 above 200K; verified |
| Gemini 3.5 Pro | TBD | TBD | Announced at GA; expect tiered, context-length-dependent |
If history holds, expect context-length-dependent tiered pricing (a higher rate above a long-context threshold, the way 3.1 Pro jumps at 200K) and a likely premium over 3.1 Pro for the 2M window and Deep Think. But I'm not going to put a fake dollar figure in a table. Wait for the model card.
For most teams, today, the answer is Flash — because it's the only one of the two you can actually deploy. But the routing logic at GA is straightforward:
Use Gemini 3.5 Flash when:
Wait for Gemini 3.5 Pro when:
The honest framing: Flash already absorbed most of what used to require Pro. Pro 3.5 has to earn its slot on the hardest reasoning and the longest context — not on general agentic coding, where Flash already leads the prior Pro tier. That's a higher bar than a normal Pro release, and it's why the GA benchmarks matter more than usual.
A few practical takeaways while we wait:
Not at GA. As of June 5, 2026, it is in limited Vertex AI preview for select enterprise customers only. There is no public model card, no published benchmarks, and no general API pricing. Google announced it at I/O on May 19, 2026, with GA targeted for sometime in June 2026 — Sundar Pichai's phrasing was "give us until next month," with no committed date.
gemini-3.5-pro, visible in the Vertex AI preview. The general-availability API name should match at GA.
Google is targeting 2M tokens — double Gemini 3.5 Flash's 1M. This is a stated target, not a confirmed spec, until the GA model card lands. For comparison, the currently shipped Gemini 3.1 Pro has a verified 1M-token window.
Unconfirmed. Pricing will be announced at GA. Google's Pro tier has historically used context-length-dependent tiered pricing — Gemini 3.1 Pro runs $2/$12 per 1M, rising to $4/$18 above 200K tokens. Expect a similar tiered structure, likely with a premium for the larger context window and Deep Think. Any specific dollar figure circulating now is speculation.
Unknown — it hasn't been benchmarked publicly. What's verified: as of late May 2026, Claude Opus 4.8 leads aggregate intelligence (Artificial Analysis Index 61.4) and SWE-bench Pro (69.2%), while GPT-5.5 leads ARC-AGI-2 (85.0%). Gemini 3.5 Flash already beats the prior Gemini Pro tier on agentic coding (Terminal-Bench 2.1 76.2%, MCP Atlas 83.6%). Where 3.5 Pro lands against Opus and GPT-5.5 is precisely the open question at GA.
If you need to ship now, use Flash — it's GA, priced, and documented, and it already leads the previous Pro tier on agentic coding at $1.50/$9 per 1M. Wait for Pro only if your workload hits Flash's ceiling on hard reasoning, needs the 2M context window, or involves heavy video/audio multimodal reasoning. For most agentic-coding workloads, Flash is the pragmatic pick today.
ARC-AGI-2 and SWE-bench Pro — the two areas where Gemini has historically trailed Opus and GPT-5.5. Also watch whether Deep Think reasoning numbers are reported separately, and whether independent trackers (Artificial Analysis, llm-stats, LMArena) reproduce Google's claimed figures before you trust them in production.
