OpenAI Spud: GPT-5.5 Pretraining Done, April Release Likely

Abhishek GautamAbhishek Gautam8 min read
OpenAI Spud: GPT-5.5 Pretraining Done, April Release Likely

Quick summary

OpenAI's next model 'Spud' completed pretraining on March 24. Polymarket gives 78% odds it ships by April 30. What GPT-5.5 means for API pricing and developers.

OpenAI's next flagship model — internally codenamed "Spud" — completed pretraining on March 24, 2026. Polymarket currently prices the probability of a release before April 30 at 78%. The expected release window is April 14 to May 5, with mid-April being the mode estimate based on OpenAI's post-pretraining deployment pipeline.

This is not a rumour. The pretraining completion date came from a researcher disclosure, and the Polymarket contract has attracted meaningful liquidity from people tracking OpenAI's release cadence. The question is not whether Spud ships in April — it almost certainly does — but what it actually does that GPT-5 does not.

What "Spud" Is in OpenAI's Model Stack

OpenAI's current public model lineup runs: GPT-4o (fast multimodal), GPT-4.5 (incremental reasoning improvement), GPT-5 (current flagship, released February 2026), o3 and o3-mini (reasoning-optimised, separate track). Spud is the next iteration above GPT-5 on the main capability track.

OpenAI has not published a Spud spec sheet. What the research community has assembled from capability evaluations and infrastructure leaks:

  • Larger context window: GPT-5 shipped with a 128K token context. Spud is expected to extend this, with estimates ranging from 256K to 512K tokens. The driving use case is long-document processing and multi-turn agentic tasks where GPT-5 hits truncation limits in production.
  • Improved tool use: GPT-5's function calling and tool use is good; Spud's is reportedly meaningfully better on multi-step tool chains — the specific capability that agentic frameworks like LangChain, AutoGPT successors, and Claude Code depend on.
  • Lower hallucination rate on factual queries: The specific benchmark being cited internally is TriviaQA and PopQA on obscure-fact questions where GPT-5 still hallucinates at rates that matter in production RAG pipelines.
  • Code generation improvement: Pass@1 on HumanEval and LiveCodeBench reportedly improves by 8–12 percentage points over GPT-5.

None of these numbers are confirmed. They are informed estimates from the community tracking OpenAI's eval scores and researcher commentary.

Why Polymarket Is at 78%, Not Higher

Polymarket contracts on AI model releases are imprecise instruments. They capture collective market belief, not inside information. The 22% "no" probability on an April 30 release reflects three legitimate risks:

Post-training safety evaluations: OpenAI's red-teaming process after pretraining completion has historically added 3–8 weeks. GPT-4 was pre-trained in August 2022 and released in March 2023 — seven months of post-training. GPT-5 had a shorter post-training cycle, but the company has been under more regulatory scrutiny since the EU AI Act came into effect.

Regulatory review pressure: The EU AI Act's general-purpose AI provisions require OpenAI to file a model card and capability disclosure for models above the compute threshold. GPT-5 was the first model subject to this requirement. Spud will be too. If the EU requested additional documentation, the timeline slips.

Competitive timing: OpenAI has a history of shipping to blunt competitor announcements. Google I/O is May 19-20, 2026. If Gemini 3.1 Ultra (discussed below) ships before Google I/O, OpenAI has incentive to move Spud before May 19 rather than wait. Competitive timing pushes toward April, not May.

The base case remains April 14–30. The tail risk is early May.

GPT-5 vs Spud: What Benchmark Improvements Actually Mean

The developer-relevant question is not "which model scores higher on MMLU" — it is: what breaks in production with GPT-5 that won't break with Spud?

Three categories of production failures in GPT-5 that Spud reportedly addresses:

Long context coherence: GPT-5 at 128K context degrades noticeably in the final 20% of the window. In practice, this means that RAG pipelines stuffing 100K+ tokens of retrieved documents get lower-quality synthesis than expected. If Spud extends to 256K with better coherence, the effective quality threshold for long-context RAG improves meaningfully.

Tool call reliability in 5+ step chains: In agentic workflows where the model must make 5 or more sequential tool calls, GPT-5's error rate compounds. Researchers have documented failure modes where the model "forgets" earlier tool outputs in long chains or calls the wrong tool when the chain branches. An improvement here directly affects anyone building agents for code review, document processing, or multi-API orchestration.

JSON output consistency: GPT-5 produces malformed JSON at low but non-trivial rates when output schemas are complex. Spud reportedly reduces this failure rate below 0.1% — which matters if you're making millions of API calls and retrying failures adds cost.

These are not glamorous benchmark improvements. They are reliability improvements that reduce the engineering overhead of building production systems on OpenAI's API.

API Pricing: What to Expect

OpenAI has not announced Spud pricing. The historical pattern is that each new flagship launches at a premium over its predecessor, then the predecessor gets repriced downward.

Current GPT-5 pricing: $15/1M input tokens, $60/1M output tokens (as of April 2026). At GPT-4 Turbo launch in November 2023, OpenAI priced it 3x cheaper than GPT-4 at launch — a deliberate volume play. The GPT-5 launch did not follow that pattern; it launched at a significant premium over GPT-4o.

For Spud, two scenarios:

  • Premium tier: Spud launches above GPT-5 pricing, positions as the "reasoning at scale" tier above GPT-5. GPT-5 gets a small price reduction to hold its market share against Gemini 3.1 Ultra.
  • Replacement pricing: Spud prices at GPT-5 levels, GPT-5 gets deprecated or moved to a legacy tier. This has happened with GPT-3.5 and GPT-4 Turbo.

The former is more likely given OpenAI's current revenue pressure — the company needs each new model to expand revenue, not cannibalize existing tiers. Budget for Spud at $18–25/1M input tokens if the premium scenario plays out.

What Developers Should Do Before the Release

Evaluate your GPT-5 production failure modes now. If you have logging on your API calls, pull the error rate on malformed outputs, tool call failures, and context truncation events. These are the categories where Spud is most likely to improve. Knowing your current failure rate gives you a baseline to measure improvement against.

Hold off on architectural workarounds. If you are currently adding engineering complexity to work around GPT-5's long-context coherence degradation — chunking, summarization layers, retrieval reruns — wait for Spud before building those workarounds into permanent architecture. If Spud solves the coherence problem, the workaround becomes technical debt.

Watch the API keys and tier announcements. OpenAI typically announces new models with a waitlist or tier restriction. If you are on GPT-5 API today, you will likely get access to Spud on the same tier within days of launch, based on the GPT-4 to GPT-4-turbo transition pattern.

Do not cancel existing evals. If you have an evaluation suite running against GPT-5, keep it running. The first week after Spud launches, the most useful thing you can do is run your existing evals against Spud and publish the comparison. That data is immediately useful to every developer on the internet and will drive traffic to wherever you publish it.

The Competitive Context: Why Timing Matters

OpenAI shipping Spud in April compresses the window for Anthropic and Google to establish their models as the default for production workloads. Claude 3.7 Sonnet (released February 2026) was briefly the benchmark leader on coding tasks. If Spud exceeds Claude 3.7 Sonnet on code generation and tool use, the window of Claude dominance on coding closes.

The pattern from the past 18 months: each new leading model holds the top position for 6–10 weeks before the next competitor ships. Spud would reset that clock to OpenAI.

For developers choosing a primary model for a new production system, the practical advice is: wait 3–4 weeks. If Spud ships by April 30 and beats Claude 3.7 Sonnet on your specific use case, you have a clear choice. If Spud does not ship or does not beat Claude on your tasks, Claude 3.7 Sonnet remains the answer for code-heavy workloads.

Key Takeaways

  • OpenAI "Spud" completed pretraining on March 24, 2026 — the post-training safety and alignment phase typically adds 3–8 weeks, putting release in the April 14–May 5 window
  • Polymarket at 78% probability by April 30 — reflects high but not certain confidence in the April timeline; 22% tail risk from regulatory review and post-training delays
  • Expected improvements: larger context window (256K–512K), better multi-step tool call reliability, lower malformed JSON rate, 8–12 point code generation improvement on HumanEval
  • Pricing: expect a premium over GPT-5's $15/1M input tokens — budget for $18–25/1M if OpenAI uses a tiered launch
  • For production decisions: hold off on architectural workarounds for GPT-5 context limitations; audit your current API failure rates so you can benchmark against Spud immediately at launch
  • Competitive context: Google I/O is May 19-20 — OpenAI has incentive to ship Spud before then to pre-empt Gemini 3.1 Ultra announcements

Compare current AI API costs across providers with LLM API Pricing. See how Spud will stack up in the Claude vs ChatGPT comparison. For context on the semiconductor supply enabling these models, read TSMC Q1 2026 earnings and the AI chip supply chain.

FAQ

Frequently Asked Questions

When is OpenAI Spud / GPT-5.5 releasing?

OpenAI's "Spud" model completed pretraining on March 24, 2026. The expected release window is April 14 to May 5, 2026. Polymarket prediction markets price the probability at 78% by April 30. Post-training safety evaluations and regulatory review under the EU AI Act are the main factors that could push the timeline to early May.

What is OpenAI Spud and how does it differ from GPT-5?

Spud is OpenAI's next flagship model above GPT-5 on the main capability track (separate from the o3 reasoning track). Expected improvements include a larger context window (estimates range 256K–512K tokens vs GPT-5's 128K), significantly better multi-step tool call reliability for agentic workflows, lower hallucination rate on factual queries, and 8–12 percentage point improvement on coding benchmarks like HumanEval.

How will OpenAI Spud be priced compared to GPT-5?

OpenAI has not announced Spud pricing. Based on the historical pattern, Spud will likely launch at a premium above GPT-5's current $15/1M input tokens — estimate $18–25/1M for input tokens in the premium tier scenario. The alternative is that Spud replaces GPT-5 at equivalent pricing, with GPT-5 moving to a legacy discounted tier.

Should I wait for OpenAI Spud before building a production AI system?

If your system is heavily dependent on long-context coherence, multi-step tool use reliability, or structured JSON output, waiting 3–4 weeks for Spud makes sense — these are the categories where Spud reportedly improves most over GPT-5. For text generation, summarization, or single-turn tasks where GPT-5 already performs well, there is no compelling reason to wait.

What Polymarket contract is tracking GPT-5.5 release?

Polymarket has a binary contract on whether OpenAI releases a new flagship model (referred to as GPT-5.5 or Spud) before April 30, 2026. As of April 11, 2026, the contract is pricing at approximately 78% "yes." The contract is not an inside information instrument — it reflects collective market belief among traders tracking OpenAI's development and release cadence.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

Free Tool

Will AI replace your job?

4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.

Check Your AI Risk Score →

Written by

Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 952+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.