GPT-5.6 Sol, Terra & Luna
Full Review, Benchmarks & Pricing (2026)
OpenAI dropped its biggest model family of 2026 on June 26: GPT-5.6 Sol, Terra, and Luna. Sol dethrones Claude Mythos 5 on the TerminalBench 2.1 coding leaderboard with a record 91.9% score. All three models hit OpenAI's "High" cybersecurity threshold — a first for an entire product line. But there's a catch: due to a U.S. government request, only about 20 vetted organizations can access the models right now. Broad availability is expected within weeks. Here's everything you need to know.
Table of Contents
Published June 26, 2026 · Updated June 27, 2026 · Sources: OpenAI Official Blog · OpenAI Deployment Safety System Card · VentureBeat · SiliconAngle · TechTimes · Wired
01 · Quick Summary
| Model | Best For | Input | Output | Highlight |
|---|---|---|---|---|
| GPT-5.6 Sol | Flagship / maximum capability | $5 / 1M | $30 / 1M | TerminalBench 2.1 #1 at 91.9% |
| GPT-5.6 Terra | Balanced / production workhorse | $2.50 / 1M | $15 / 1M | GPT-5.5-level performance, 50% cheaper |
| GPT-5.6 Luna | Lightweight / high-frequency | $1 / 1M | $6 / 1M | 80% cheaper than Sol; High cyber rating |
| Current status | Limited preview (~20 approved partners). General release expected within weeks (July 2026). | |||
OpenAI's June 26 launch is the most significant model release since GPT-5.5 — and the first family where every tier, including entry-level Luna, crossed OpenAI's internal "High" cybersecurity risk rating. Sol's Ultra multi-agent mode reclaimed the TerminalBench crown from Claude Mythos 5 after just 17 days at the top. The catch: a U.S. government security review limits access to roughly 20 vetted organizations until broad rollout.
02 · Three Pain Points for Developers Waiting on GPT-5.6
- You cannot benchmark what you cannot call. GPT-5.6 is live for ~20 approved partners only. Teams building on
gpt-5.5or routing through Claude Opus 4.8 after the Fable 5 export ban have no public API endpoint to regression-test against yet. Prediction markets price July broad release at 87%, but your sprint planning cannot ride Polymarket odds alone. - Ultra mode scores are not your invoice. Sol's record 91.9% TerminalBench score runs in Ultra multi-agent mode — significantly more tokens than standard inference. Budgeting Sol at headline benchmark performance without modeling token multiplication will blow cost projections when you ship agent workflows at scale.
- Government gatekeeping adds routing risk. June 2026 blocked all three frontier labs: OpenAI limited preview, Anthropic forced offline, Google delayed Gemini 3.5 Pro. If your stack assumes uninterrupted access to the latest model tier, the new normal is policy-driven availability windows — not just API rate limits. See our June 2026 release roundup for how fast the landscape shifted in one month.
03 · What Is GPT-5.6? The Solar System Naming Explained
GPT-5.6 is OpenAI's newest frontier model series, named after celestial bodies for the first time:
- Sol (the Sun) — Flagship, maximum capability for complex coding, cybersecurity research, and long-horizon agent workflows
- Terra (the Earth) — Balanced performance and cost for enterprise document analysis, customer support, and high-volume API calls
- Luna (the Moon) — Fast, lightweight, affordable tier for summarization, drafting, and routine automation
The release was not smooth. Following President Trump's June 2 executive order on AI model safety, OpenAI was asked to limit GPT-5.6's launch during a government review period — the first time the U.S. government has formally required an AI company to restrict a frontier model's release. OpenAI CEO Sam Altman complied while publicly pushing back:
"We don't believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them."
04 · GPT-5.6 Model Comparison at a Glance
| Model | Best For | Input Price | Output Price | Context Window |
|---|---|---|---|---|
| Sol | Complex coding, security research, long-horizon agents | $5 / 1M tokens | $30 / 1M tokens | ~1.5M tokens |
| Terra | High-volume business tasks, document analysis | $2.50 / 1M tokens | $15 / 1M tokens | ~1.5M tokens |
| Luna | Summarization, drafting, routine automation | $1 / 1M tokens | $6 / 1M tokens | ~1.5M tokens |
Note: Terra delivers GPT-5.5-level performance at half the price. Luna costs 80% less than Sol while still receiving a "High" cybersecurity rating — the first non-flagship OpenAI model to earn High in both cybersecurity and biology domains.
05 · GPT-5.6 Sol: Max Mode & Ultra Mode
Sol is OpenAI's most capable model to date. Beyond raw performance, it introduces two reasoning modes that did not exist before:
Max Mode
Sol takes additional time to reason before responding — "slow thinking" that trades latency for accuracy. Ideal when you need the answer to be right, not just fast: high-stakes code review, security analysis, or multi-step planning where a wrong first pass costs more than waiting.
Ultra Mode
This is the game-changer. Instead of a single model working through a problem, Ultra mode spawns multiple subagents that split the task, execute in parallel, and merge their results. This multi-agent architecture is why Sol achieved its TerminalBench record of 91.9%. It does consume significantly more tokens, so reserve Ultra for genuinely complex tasks — not every API call in your agent loop.
06 · GPT-5.6 Benchmark Results: The Numbers That Matter
Coding: TerminalBench 2.1
TerminalBench 2.1 tests multi-step command-line planning with 89 complex programming challenges — closer to real-world agent tasks than traditional code completion benchmarks.
| Model | Score | Mode |
|---|---|---|
| GPT-5.6 Sol | 91.9% 🏆 New #1 | Ultra (multi-agent) |
| GPT-5.6 Sol | 88.8% | Standard |
| Claude Mythos 5 | 88.0% | Standard |
| GPT-5.5 | 83.4% | Standard |
| Gemini 3.1 Pro Preview | 70.7% | Standard |
Claude Mythos 5 had held the top spot for only 17 days (since June 9) before Sol came along.
Long-Horizon Agents: Agent's Last Exam
| Model | Task Completion Rate (Code Mode) |
|---|---|
| GPT-5.6 Sol | 50.9% — Only model to cross 50% |
| GPT-5.6 Luna | Slightly above GPT-5.5 |
Cybersecurity: CTF & ExploitBench
GPT-5.6 is the first OpenAI model family where all three tiers hit the "High" cybersecurity classification.
Capture-The-Flag (CTF) evaluation:
| Model | Hit Rate |
|---|---|
| Sol | 96.7% |
| Terra | 91.84% |
| Luna | 85.19% |
ExploitBench (vulnerability research): Sol matches Anthropic's Mythos Preview on ExploitBench while using only ~1/3 of the output tokens. That's the same security research capability at dramatically lower cost.
Safety note: OpenAI's red-teaming confirmed Sol cannot autonomously engineer a complete, functional exploit chain against real-world hardened targets (Chromium, Firefox). It stays below OpenAI's "Cyber Critical" threshold.
Life Sciences: GeneBench v1 & HealthBench
- GeneBench v1 (genomics & quantitative biology): Sol matches or exceeds GPT-5.5 using fewer tokens
- HealthBench Professional: Sol scores 60.5 — +8.7 points above GPT-5.5
07 · GPT-5.6 vs Claude Mythos 5: Which Is Better for Coding?
This is the comparison everyone is asking about. Here's the honest breakdown:
| Category | GPT-5.6 Sol | Claude Mythos 5 |
|---|---|---|
| TerminalBench 2.1 | 91.9% (Ultra) / 88.8% standard ✅ | 88.0% |
| ExploitBench | Near-identical, 3× cheaper ✅ | Strong (restricted access) |
| Pricing | $5 input / $30 output ✅ | $10 input / $50 output (offline) |
| Availability | Limited preview → general release soon | Currently offline (U.S. export control) |
| Context Window | ~1.5M tokens ✅ | 200K tokens |
Bottom line: Sol beats Mythos 5 on TerminalBench and offers comparable security research capability at a fraction of the cost. However, Mythos 5 may still lead on benchmarks like SWE-Bench Pro (where GPT-5.6 system card data hasn't been fully published yet). We'll update this comparison once OpenAI releases the complete benchmark report. For routing alternatives while Mythos stays offline, see our AI coding assistant comparison.
08 · The Government Restriction: Why Can't I Access GPT-5.6 Yet?
What happened
On June 2, 2026, President Trump signed an executive order allowing U.S. government agencies up to 30 days of pre-release access to review frontier AI models for national security concerns.
On June 26, following a White House request coordinated by the Office of Science and Technology Policy (OSTP) and the Office of the National Cyber Director (ONCD), OpenAI agreed to limit GPT-5.6's launch to approximately 20 pre-approved "trusted partner" organizations.
Why it matters
This is the first time the U.S. government has formally required an AI company to restrict a model's release — setting a precedent that could reshape how frontier models are deployed globally.
OpenAI publicly pushed back even while complying:
"We don't believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them."
Context: the "Big Three" all got blocked in June
| Company | Model | Status |
|---|---|---|
| OpenAI | GPT-5.6 Sol/Terra/Luna | Limited preview (~20 orgs) |
| Anthropic | Claude Fable 5 / Mythos 5 | Forced offline June 12 via export control |
| Gemini 3.5 Pro | Delayed to July |
June 2026 was supposed to be the biggest month in AI history. Instead, all three flagship releases got blocked at the door.
09 · GPT-5.6 on Cerebras: 750 Tokens Per Second
Starting in July, OpenAI is launching Sol on Cerebras hardware. The headline number: 750 tokens per second.
Most frontier models today: 50–150 tokens/second
GPT-5.6 Sol on Cerebras: 750 tokens/second (5× to 15× faster)
Example: A 10-second response today → under 1 second at peak throughputFor applications like real-time coding assistants, interactive agents, or live customer-facing AI, this isn't just a speed bump — it's a category change. Initial access will be limited to select enterprise customers as Cerebras expands capacity. Pair this with OpenAI's Jalapeño custom inference chip roadmap and inference economics shift on two fronts: faster tokens and cheaper silicon.
10 · When Will GPT-5.6 Be Available to Everyone?
Right now (June 2026): ~20 approved partner organizations via API and Codex only. Ordinary users cannot access GPT-5.6 in ChatGPT yet.
Coming in July 2026:
- General availability on ChatGPT (Plus and Pro users first)
- Public API access
- GPT-5.6 Sol on Cerebras hardware: up to 750 tokens/second for select enterprise customers
Market prediction: Polymarket traders currently assign an 87% probability that GPT-5.6 will be broadly released by July 31, 2026.
11 · GPT-5.6 Pricing: Is It Worth It?
| Model | Input | Output | vs GPT-5.5 |
|---|---|---|---|
| Sol | $5/M | $30/M | Same price, much better performance |
| Terra | $2.50/M | $15/M | 50% cheaper than Sol, GPT-5.5 performance |
| Luna | $1/M | $6/M | 80% cheaper than Sol |
For comparison: Claude Fable 5 was priced at $10/M input and $50/M output before going offline. GPT-5.6 Sol delivers comparable or superior capability at half the cost. For broader June 2026 pricing context, see our AI price cuts roundup.
12 · Which GPT-5.6 Model Should You Use?
| Your Need | Recommended Model |
|---|---|
| Complex code generation, debugging, multi-step agent tasks | Sol |
| Enterprise document analysis, customer support, large-scale API calls | Terra |
| High-frequency summarization, drafting, routine automation | Luna |
| Budget-limited but need flagship-level capability | Terra (GPT-5.5 performance at 50% lower cost) |
| Latency-critical real-time apps (after July Cerebras launch) | Sol on Cerebras |
Use Sol if: You're building complex coding agents, need frontier cybersecurity research, run long-horizon multi-step autonomous tasks, or accuracy matters more than speed or cost.
Use Terra if: You process high volumes of business documents, need GPT-5.5-level performance at half the API cost, or ship production applications at scale.
Use Luna if: You handle summarization, drafting, classification, or routine automation where latency and cost are top priorities — millions of lightweight API calls per day.
13 · Safety & Security: What OpenAI Built Into GPT-5.6
Given that all three models hit OpenAI's "High" cybersecurity classification, safety was a primary focus:
- Real-time misuse classifiers running on every output
- Account-level review for sensitive workflows
- 700,000 A100-equivalent GPU hours of automated red-teaming
- Universal jailbreak testing — finding and patching cross-prompt attack vectors
- A specialized large reasoning model filters responses before reaching users if primary safeguards fail
- All models tested by external security organizations before launch
OpenAI's testing confirmed Sol can identify vulnerabilities and exploit primitives in Chromium and Firefox codebases but cannot autonomously construct complete, functional exploit chains — keeping the family below the "Cyber Critical" threshold.
14 · Five-Step Isolated Mac Checklist: Evaluate GPT-5.6 Before Broad Release
- Lock your current model baseline. Export 30 days of token usage and dollar spend per model (
gpt-5.5, Claude Opus 4.8, Codex routes) so you have a pre-GPT-5.6 reference line before Sol, Terra, or Luna appear in the API. - Subscribe to OpenAI release channels. OpenAI Blog, platform.openai.com changelogs, and ChatGPT status — GPT-5.6 access will surface as new model IDs and tier rollouts, not a consumer feature flag.
- Build a regression prompt suite. Curate 20–50 production prompts with fixed token counts, latency targets, and quality rubrics across agent workflows, coding tasks, and document analysis.
- Rent an isolated Mac sandbox. Configure Cursor with test API keys on an Apple Silicon rental node; validate macOS-only plugins and Keychain flows while running your suite nightly. See M-series compute pricing.
- Re-benchmark 48 hours after API access opens. When GPT-5.6 model IDs go live, rerun the suite, compare total inference spend, TerminalBench-equivalent task success rates, and p95 latency — then adjust production routing or customer-facing pricing.
15 · FAQ
Q: Is GPT-5.6 available on ChatGPT now?
A: Not yet for the general public. Currently limited to ~20 trusted partner organizations. Full ChatGPT rollout expected within weeks (July 2026).
Q: Is GPT-5.6 Sol better than Claude Fable 5 for coding?
A: Sol leads on TerminalBench 2.1 (91.9% vs Claude Mythos 5's 88%). Claude Fable 5 leads on SWE-Bench Pro, but official GPT-5.6 SWE-Bench scores haven't been published yet. Sol is the better value — comparable or better performance at a lower price.
Q: What is "Ultra mode" in GPT-5.6 Sol?
A: Ultra mode deploys multiple AI subagents that work in parallel on different parts of a task, then synthesize a unified result. It significantly boosts performance on complex tasks but uses considerably more tokens.
Q: Why is GPT-5.6 restricted?
A: The U.S. government (via White House / OSTP / ONCD) requested OpenAI limit access during a security review period following Trump's June 2 executive order on AI model safety. OpenAI complied but publicly stated it opposes this becoming permanent practice.
Q: How fast will GPT-5.6 be on Cerebras?
A: Up to 750 tokens per second — roughly 5–15× faster than most current frontier models. Launching July 2026 for select enterprise customers.
Q: What is the GPT-5.6 context window size?
A: Reported at approximately 1.5 million tokens, up from GPT-5.5's 1 million token context. Official confirmation expected with the full system card release.
Q: Are all three GPT-5.6 models safe to use for cybersecurity work?
A: All three carry OpenAI's "High" cybersecurity risk rating — meaning they have significantly elevated capability in vulnerability research. OpenAI has built layered safeguards including real-time classifiers and red-teaming to prevent misuse, and confirmed the models cannot autonomously build complete functional exploits.
16 · What's Coming Next
- Full GPT-5.6 system card with complete benchmark results (expected at general release)
- Cerebras deployment for Sol at 750 token/s (July 2026)
- ChatGPT general availability across Plus, Pro, and API (within weeks)
- U.S. government cyber executive order framework finalization (expected by ~July 2, 2026 per the 30-day window)
17 · References & Further Reading
- OpenAI Official: Previewing GPT-5.6 Sol
- OpenAI Deployment Safety System Card
- VentureBeat: GPT-5.6 Launch Coverage
- SiliconAngle: GPT-5.6 vs Claude Mythos 5
- TechTimes: Government Lock Analysis
18 · Rent a Mac: Isolate Your GPT-5.6 Evaluation Before Public API Access
GPT-5.6 changes what happens inside OpenAI's racks — not on your laptop. But when Sol, Terra, and Luna hit the public API, the teams who win are the ones who already measured baseline token economics and agent success rates in a reproducible environment. Running ad hoc curl scripts from a Windows daily driver mixes OS noise with API signal; polluting your production Mac with experimental keys risks credential bleed when you rotate after a model switch.
A day-rented Apple Silicon Mac gives you a clean macOS shell matching how most teams ship AI products: Cursor for agent workflows, Keychain for API secrets, local scripts for batch regression. Spin it up now, snapshot your pre-GPT-5.6 cost baseline on gpt-5.5, and rerun the same suite the week API model IDs appear — without touching your primary machine. If you are comparing model stacks while government review delays broad access, pair this with our rent vs. buy cost breakdown to decide whether short-term rental or longer commit fits your validation window.