GPT-5.6 Sol, Terra & Luna: Full Review, Benchmarks & Pricing (2026)

Q: Is GPT-5.6 available on ChatGPT now?

Not yet for the general public. Currently limited to approximately 20 trusted partner organizations. Full ChatGPT rollout expected within weeks (July 2026).

Q: Are all three GPT-5.6 models safe to use for cybersecurity work?

All three carry OpenAI's High cybersecurity risk rating — meaning they have significantly elevated capability in vulnerability research. OpenAI has built layered safeguards including real-time classifiers and red-teaming to prevent misuse, and confirmed the models cannot autonomously build complete functional exploits.

Table of Contents

Published June 26, 2026 · Updated June 27, 2026 · Sources: OpenAI Official Blog · OpenAI Deployment Safety System Card · VentureBeat · SiliconAngle · TechTimes · Wired

01 · Quick Summary

Model	Best For	Input	Output	Highlight
GPT-5.6 Sol	Flagship / maximum capability	$5 / 1M	$30 / 1M	TerminalBench 2.1 #1 at 91.9%
GPT-5.6 Terra	Balanced / production workhorse	$2.50 / 1M	$15 / 1M	GPT-5.5-level performance, 50% cheaper
GPT-5.6 Luna	Lightweight / high-frequency	$1 / 1M	$6 / 1M	80% cheaper than Sol; High cyber rating
Current status		Limited preview (~20 approved partners). General release expected within weeks (July 2026).

OpenAI's June 26 launch is the most significant model release since GPT-5.5 — and the first family where every tier, including entry-level Luna, crossed OpenAI's internal "High" cybersecurity risk rating. Sol's Ultra multi-agent mode reclaimed the TerminalBench crown from Claude Mythos 5 after just 17 days at the top. The catch: a U.S. government security review limits access to roughly 20 vetted organizations until broad rollout.

02 · Three Pain Points for Developers Waiting on GPT-5.6

You cannot benchmark what you cannot call. GPT-5.6 is live for ~20 approved partners only. Teams building on gpt-5.5 or routing through Claude Opus 4.8 after the Fable 5 export ban have no public API endpoint to regression-test against yet. Prediction markets price July broad release at 87%, but your sprint planning cannot ride Polymarket odds alone.
Ultra mode scores are not your invoice. Sol's record 91.9% TerminalBench score runs in Ultra multi-agent mode — significantly more tokens than standard inference. Budgeting Sol at headline benchmark performance without modeling token multiplication will blow cost projections when you ship agent workflows at scale.
Government gatekeeping adds routing risk. June 2026 blocked all three frontier labs: OpenAI limited preview, Anthropic forced offline, Google delayed Gemini 3.5 Pro. If your stack assumes uninterrupted access to the latest model tier, the new normal is policy-driven availability windows — not just API rate limits. See our June 2026 release roundup for how fast the landscape shifted in one month.

03 · What Is GPT-5.6? The Solar System Naming Explained

GPT-5.6 is OpenAI's newest frontier model series, named after celestial bodies for the first time:

Sol (the Sun) — Flagship, maximum capability for complex coding, cybersecurity research, and long-horizon agent workflows
Terra (the Earth) — Balanced performance and cost for enterprise document analysis, customer support, and high-volume API calls
Luna (the Moon) — Fast, lightweight, affordable tier for summarization, drafting, and routine automation

The release was not smooth. Following President Trump's June 2 executive order on AI model safety, OpenAI was asked to limit GPT-5.6's launch during a government review period — the first time the U.S. government has formally required an AI company to restrict a frontier model's release. OpenAI CEO Sam Altman complied while publicly pushing back:

"We don't believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them."

04 · GPT-5.6 Model Comparison at a Glance

Model	Best For	Input Price	Output Price	Context Window
Sol	Complex coding, security research, long-horizon agents	$5 / 1M tokens	$30 / 1M tokens	~1.5M tokens
Terra	High-volume business tasks, document analysis	$2.50 / 1M tokens	$15 / 1M tokens	~1.5M tokens
Luna	Summarization, drafting, routine automation	$1 / 1M tokens	$6 / 1M tokens	~1.5M tokens

Note: Terra delivers GPT-5.5-level performance at half the price. Luna costs 80% less than Sol while still receiving a "High" cybersecurity rating — the first non-flagship OpenAI model to earn High in both cybersecurity and biology domains.

05 · GPT-5.6 Sol: Max Mode & Ultra Mode

Sol is OpenAI's most capable model to date. Beyond raw performance, it introduces two reasoning modes that did not exist before:

Max Mode

Sol takes additional time to reason before responding — "slow thinking" that trades latency for accuracy. Ideal when you need the answer to be right, not just fast: high-stakes code review, security analysis, or multi-step planning where a wrong first pass costs more than waiting.

Ultra Mode

This is the game-changer. Instead of a single model working through a problem, Ultra mode spawns multiple subagents that split the task, execute in parallel, and merge their results. This multi-agent architecture is why Sol achieved its TerminalBench record of 91.9%. It does consume significantly more tokens, so reserve Ultra for genuinely complex tasks — not every API call in your agent loop.

06 · GPT-5.6 Benchmark Results: The Numbers That Matter

Coding: TerminalBench 2.1

TerminalBench 2.1 tests multi-step command-line planning with 89 complex programming challenges — closer to real-world agent tasks than traditional code completion benchmarks.

Model	Score	Mode
GPT-5.6 Sol	91.9% 🏆 New #1	Ultra (multi-agent)
GPT-5.6 Sol	88.8%	Standard
Claude Mythos 5	88.0%	Standard
GPT-5.5	83.4%	Standard
Gemini 3.1 Pro Preview	70.7%	Standard

Claude Mythos 5 had held the top spot for only 17 days (since June 9) before Sol came along.

Long-Horizon Agents: Agent's Last Exam

Model	Task Completion Rate (Code Mode)
GPT-5.6 Sol	50.9% — Only model to cross 50%
GPT-5.6 Luna	Slightly above GPT-5.5

Cybersecurity: CTF & ExploitBench

GPT-5.6 is the first OpenAI model family where all three tiers hit the "High" cybersecurity classification.

Capture-The-Flag (CTF) evaluation:

Model	Hit Rate
Sol	96.7%
Terra	91.84%
Luna	85.19%

ExploitBench (vulnerability research): Sol matches Anthropic's Mythos Preview on ExploitBench while using only ~1/3 of the output tokens. That's the same security research capability at dramatically lower cost.

Safety note: OpenAI's red-teaming confirmed Sol cannot autonomously engineer a complete, functional exploit chain against real-world hardened targets (Chromium, Firefox). It stays below OpenAI's "Cyber Critical" threshold.

Life Sciences: GeneBench v1 & HealthBench

GeneBench v1 (genomics & quantitative biology): Sol matches or exceeds GPT-5.5 using fewer tokens
HealthBench Professional: Sol scores 60.5 — +8.7 points above GPT-5.5

07 · GPT-5.6 vs Claude Mythos 5: Which Is Better for Coding?

This is the comparison everyone is asking about. Here's the honest breakdown:

Category	GPT-5.6 Sol	Claude Mythos 5
TerminalBench 2.1	91.9% (Ultra) / 88.8% standard ✅	88.0%
ExploitBench	Near-identical, 3× cheaper ✅	Strong (restricted access)
Pricing	$5 input / $30 output ✅	$10 input / $50 output (offline)
Availability	Limited preview → general release soon	Currently offline (U.S. export control)
Context Window	~1.5M tokens ✅	200K tokens

Bottom line: Sol beats Mythos 5 on TerminalBench and offers comparable security research capability at a fraction of the cost. However, Mythos 5 may still lead on benchmarks like SWE-Bench Pro (where GPT-5.6 system card data hasn't been fully published yet). We'll update this comparison once OpenAI releases the complete benchmark report. For routing alternatives while Mythos stays offline, see our AI coding assistant comparison.

08 · The Government Restriction: Why Can't I Access GPT-5.6 Yet?

What happened

On June 2, 2026, President Trump signed an executive order allowing U.S. government agencies up to 30 days of pre-release access to review frontier AI models for national security concerns.

On June 26, following a White House request coordinated by the Office of Science and Technology Policy (OSTP) and the Office of the National Cyber Director (ONCD), OpenAI agreed to limit GPT-5.6's launch to approximately 20 pre-approved "trusted partner" organizations.

Why it matters

This is the first time the U.S. government has formally required an AI company to restrict a model's release — setting a precedent that could reshape how frontier models are deployed globally.

OpenAI publicly pushed back even while complying:

"We don't believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them."

Context: the "Big Three" all got blocked in June

Company	Model	Status
OpenAI	GPT-5.6 Sol/Terra/Luna	Limited preview (~20 orgs)
Anthropic	Claude Fable 5 / Mythos 5	Forced offline June 12 via export control
Google	Gemini 3.5 Pro	Delayed to July

June 2026 was supposed to be the biggest month in AI history. Instead, all three flagship releases got blocked at the door.

09 · GPT-5.6 on Cerebras: 750 Tokens Per Second

Starting in July, OpenAI is launching Sol on Cerebras hardware. The headline number: 750 tokens per second.

Most frontier models today:     50–150 tokens/second
GPT-5.6 Sol on Cerebras:        750 tokens/second  (5× to 15× faster)

Example: A 10-second response today → under 1 second at peak throughput

For applications like real-time coding assistants, interactive agents, or live customer-facing AI, this isn't just a speed bump — it's a category change. Initial access will be limited to select enterprise customers as Cerebras expands capacity. Pair this with OpenAI's Jalapeño custom inference chip roadmap and inference economics shift on two fronts: faster tokens and cheaper silicon.

10 · When Will GPT-5.6 Be Available to Everyone?

Right now (June 2026): ~20 approved partner organizations via API and Codex only. Ordinary users cannot access GPT-5.6 in ChatGPT yet.

Coming in July 2026:

General availability on ChatGPT (Plus and Pro users first)
Public API access
GPT-5.6 Sol on Cerebras hardware: up to 750 tokens/second for select enterprise customers

Market prediction: Polymarket traders currently assign an 87% probability that GPT-5.6 will be broadly released by July 31, 2026.

11 · GPT-5.6 Pricing: Is It Worth It?

Model	Input	Output	vs GPT-5.5
Sol	$5/M	$30/M	Same price, much better performance
Terra	$2.50/M	$15/M	50% cheaper than Sol, GPT-5.5 performance
Luna	$1/M	$6/M	80% cheaper than Sol

For comparison: Claude Fable 5 was priced at $10/M input and $50/M output before going offline. GPT-5.6 Sol delivers comparable or superior capability at half the cost. For broader June 2026 pricing context, see our AI price cuts roundup.

12 · Which GPT-5.6 Model Should You Use?

Your Need	Recommended Model
Complex code generation, debugging, multi-step agent tasks	Sol
Enterprise document analysis, customer support, large-scale API calls	Terra
High-frequency summarization, drafting, routine automation	Luna
Budget-limited but need flagship-level capability	Terra (GPT-5.5 performance at 50% lower cost)
Latency-critical real-time apps (after July Cerebras launch)	Sol on Cerebras

Use Sol if: You're building complex coding agents, need frontier cybersecurity research, run long-horizon multi-step autonomous tasks, or accuracy matters more than speed or cost.

Use Terra if: You process high volumes of business documents, need GPT-5.5-level performance at half the API cost, or ship production applications at scale.

Use Luna if: You handle summarization, drafting, classification, or routine automation where latency and cost are top priorities — millions of lightweight API calls per day.

13 · Safety & Security: What OpenAI Built Into GPT-5.6

Given that all three models hit OpenAI's "High" cybersecurity classification, safety was a primary focus:

Real-time misuse classifiers running on every output
Account-level review for sensitive workflows
700,000 A100-equivalent GPU hours of automated red-teaming
Universal jailbreak testing — finding and patching cross-prompt attack vectors
A specialized large reasoning model filters responses before reaching users if primary safeguards fail
All models tested by external security organizations before launch

OpenAI's testing confirmed Sol can identify vulnerabilities and exploit primitives in Chromium and Firefox codebases but cannot autonomously construct complete, functional exploit chains — keeping the family below the "Cyber Critical" threshold.

14 · Five-Step Isolated Mac Checklist: Evaluate GPT-5.6 Before Broad Release

Lock your current model baseline. Export 30 days of token usage and dollar spend per model (gpt-5.5, Claude Opus 4.8, Codex routes) so you have a pre-GPT-5.6 reference line before Sol, Terra, or Luna appear in the API.
Subscribe to OpenAI release channels. OpenAI Blog, platform.openai.com changelogs, and ChatGPT status — GPT-5.6 access will surface as new model IDs and tier rollouts, not a consumer feature flag.
Build a regression prompt suite. Curate 20–50 production prompts with fixed token counts, latency targets, and quality rubrics across agent workflows, coding tasks, and document analysis.
Rent an isolated Mac sandbox. Configure Cursor with test API keys on an Apple Silicon rental node; validate macOS-only plugins and Keychain flows while running your suite nightly. See M-series compute pricing.
Re-benchmark 48 hours after API access opens. When GPT-5.6 model IDs go live, rerun the suite, compare total inference spend, TerminalBench-equivalent task success rates, and p95 latency — then adjust production routing or customer-facing pricing.

15 · FAQ

Q: Is GPT-5.6 available on ChatGPT now?
A: Not yet for the general public. Currently limited to ~20 trusted partner organizations. Full ChatGPT rollout expected within weeks (July 2026).

Q: Is GPT-5.6 Sol better than Claude Fable 5 for coding?
A: Sol leads on TerminalBench 2.1 (91.9% vs Claude Mythos 5's 88%). Claude Fable 5 leads on SWE-Bench Pro, but official GPT-5.6 SWE-Bench scores haven't been published yet. Sol is the better value — comparable or better performance at a lower price.

Q: What is "Ultra mode" in GPT-5.6 Sol?
A: Ultra mode deploys multiple AI subagents that work in parallel on different parts of a task, then synthesize a unified result. It significantly boosts performance on complex tasks but uses considerably more tokens.

Q: Why is GPT-5.6 restricted?
A: The U.S. government (via White House / OSTP / ONCD) requested OpenAI limit access during a security review period following Trump's June 2 executive order on AI model safety. OpenAI complied but publicly stated it opposes this becoming permanent practice.

Q: How fast will GPT-5.6 be on Cerebras?
A: Up to 750 tokens per second — roughly 5–15× faster than most current frontier models. Launching July 2026 for select enterprise customers.

Q: What is the GPT-5.6 context window size?
A: Reported at approximately 1.5 million tokens, up from GPT-5.5's 1 million token context. Official confirmation expected with the full system card release.

Q: Are all three GPT-5.6 models safe to use for cybersecurity work?
A: All three carry OpenAI's "High" cybersecurity risk rating — meaning they have significantly elevated capability in vulnerability research. OpenAI has built layered safeguards including real-time classifiers and red-teaming to prevent misuse, and confirmed the models cannot autonomously build complete functional exploits.

16 · What's Coming Next

Full GPT-5.6 system card with complete benchmark results (expected at general release)
Cerebras deployment for Sol at 750 token/s (July 2026)
ChatGPT general availability across Plus, Pro, and API (within weeks)
U.S. government cyber executive order framework finalization (expected by ~July 2, 2026 per the 30-day window)

17 · References & Further Reading

18 · Rent a Mac: Isolate Your GPT-5.6 Evaluation Before Public API Access

GPT-5.6 changes what happens inside OpenAI's racks — not on your laptop. But when Sol, Terra, and Luna hit the public API, the teams who win are the ones who already measured baseline token economics and agent success rates in a reproducible environment. Running ad hoc curl scripts from a Windows daily driver mixes OS noise with API signal; polluting your production Mac with experimental keys risks credential bleed when you rotate after a model switch.

A day-rented Apple Silicon Mac gives you a clean macOS shell matching how most teams ship AI products: Cursor for agent workflows, Keychain for API secrets, local scripts for batch regression. Spin it up now, snapshot your pre-GPT-5.6 cost baseline on gpt-5.5, and rerun the same suite the week API model IDs appear — without touching your primary machine. If you are comparing model stacks while government review delays broad access, pair this with our rent vs. buy cost breakdown to decide whether short-term rental or longer commit fits your validation window.