OpenAI Unveils Jalapeño: First Custom AI Chip
Quick summary
OpenAI and Broadcom unveiled Jalapeño on June 24, 2026 — a custom LLM inference ASIC manufactured by TSMC in nine months, claiming 50% lower cost per token than Nvidia GPUs.
Read next
- Nvidia June 5: How the Largest AI Chip Company Amplifies CorrectionsNvidia fell harder than Nasdaq on June 5 as Broadcom dropped 15% on AI demand concerns. Here is why NVDA amplifies AI corrections and the investor outlook.
- OpenAI Closes $122B Round at $852B Valuation — Amazon's Hidden AGI Clause ExplainedOpenAI closed a $122B funding round on March 31 at $852B valuation. Amazon's $35B is contingent on IPO or AGI by 2028. What this means for developers and the API ecosystem.
OpenAI unveiled its first custom chip on June 24, 2026. Named Jalapeño, the inference-only ASIC was designed by OpenAI, built in partnership with Broadcom, and manufactured by TSMC in nine months. Broadcom CEO Hock Tan told Reuters it may be the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors. Early testing shows roughly 50% lower cost per inference token compared to current Nvidia GPU clusters.
This is a significant shift in how the largest AI lab in the world runs its own products. Until today, every ChatGPT query and OpenAI API call ran on Nvidia and Microsoft Azure GPU hardware. Jalapeño does not change the training stack — OpenAI still needs Nvidia for that — but inference is where the operating cost lives. The company spent an estimated $5 billion on inference compute in 2025. A 50% reduction on even a fraction of that traffic is hundreds of millions of dollars annually.
What Is the OpenAI Jalapeño Chip?
Jalapeño is a custom application-specific integrated circuit designed exclusively for running large language model inference. It handles the process of generating text outputs in response to user queries — what happens every time someone sends a message to ChatGPT or calls the OpenAI API.
Unlike Nvidia GPUs, which carry compute capacity optimized across training, inference, rendering, and scientific simulation, Jalapeño is tuned entirely around the memory-movement patterns, kernel execution, and networking requirements of transformer-based models. That specialization is the source of the cost reduction. Silicon that does one job efficiently costs less per operation than silicon designed to do everything adequately.
The chip is manufactured on TSMC process nodes — the same foundry that makes Apple's A-series and M-series chips, Nvidia's Hopper and Blackwell GPUs, and Google's TPUs. Jalapeño will not be sold externally. It is OpenAI hardware, built for OpenAI infrastructure.
The 9-Month Development Record
From architecture specification to manufacturing tape-out in nine months is genuinely fast. Production HPC silicon typically takes 18 to 24 months through the full ASIC development pipeline. Broadcom has completed shorter timelines for mobile and networking chips, but those are far less architecturally complex than a chip targeting LLM inference performance at data-center scale.
Two factors enabled the speed. First, Broadcom brought an existing LLM-optimized silicon platform that OpenAI could build on rather than starting from first principles. Second, and per OpenAI's own announcement, the company used its AI models to accelerate chip design — specifically for verification, layout optimization, and RTL generation. The irony is deliberate: AI-designed hardware runs AI inference.
If that methodology repeats at scale, it matters beyond this chip. The hardware iteration cycle in AI has been a structural bottleneck: model capability advances quarterly, but hardware takes two years to follow. A nine-month ASIC cycle, if it holds, closes that gap substantially and gives OpenAI a tighter loop between model architecture decisions and the silicon they run on.
Broadcom and TSMC: How the Partnership Works
Broadcom acts as the silicon design and manufacturing integration partner. OpenAI defines the architecture, memory hierarchy, and inference-specific compute requirements. Broadcom translates that into physical silicon design and coordinates manufacturing with TSMC. OpenAI retains ownership of the resulting IP.
This is structurally similar to how Google built its Tensor Processing Units and how Apple develops M-series chips. The difference is that Google and Apple built their silicon programs over years. OpenAI got there in nine months on what Broadcom is calling a multi-generation program — a roadmap, not a one-off.
The announced target is 10 gigawatts of Jalapeño-based compute deployed across OpenAI and Microsoft data centers by 2029. The next-generation chip is already planned for 2028. OpenAI is building a hardware program with the same cadence expectations as its model program.
How Jalapeño Compares to Current Inference Hardware
| Chip | Type | Inference Cost vs H200 | Training Use | External Sale |
|---|---|---|---|---|
| Nvidia H200 | GPU | Baseline | Yes | Yes |
| Nvidia B200 (Blackwell) | GPU | Broadly similar | Yes | Yes |
| Google TPU v5e | TPU | Competitive | Limited | GCP only |
| Amazon Inferentia 2 | ASIC | ~30-40% below GPU | Limited | AWS only |
| OpenAI Jalapeño | ASIC | ~50% below H200 | No | No |
The 50% figure applies to LLM inference workloads specifically. Broadcom CEO Hock Tan told Bloomberg the chip delivers performance per watt on par with Nvidia Blackwell at roughly 50% lower cost per token. These are pre-production numbers from early lab testing. Final validated benchmarks against Blackwell B200 and Google TPU v5e have not been published.
For training runs, the comparison is irrelevant. Jalapeño was designed for inference. OpenAI's training infrastructure continues running on Nvidia GPU clusters.
What Jalapeño Does to Nvidia
Broadcom (AVGO) rose roughly 2% on the announcement. Nvidia (NVDA) fell 0.26%. The muted reaction reflects a real assessment: Jalapeño is an inference chip. Nvidia's dominant position in AI training — the bigger contract value — is untouched.
But the direction matters. Every major foundation model lab is now running or building custom inference silicon:
- Google: TPU v5 and v6 family
- Meta: MTIA inference chip, deployed at scale
- Amazon: Inferentia 2 and Trainium
- Microsoft: Maia AI accelerator
- Apple: Neural Engine and A-series silicon
- OpenAI: Jalapeño (announced June 24, 2026)
Nvidia's total addressable inference market is under pressure from every direction simultaneously. The company's response will likely intensify around NVLink networking, the CUDA software ecosystem, and multi-modal workloads where specialized inference ASICs offer no advantage — fine-tuning, vision tasks, audio processing.
When Will Developers See Cheaper API Costs?
Jalapeño enters small prototype deployments by end of 2026. Full production ramp is 2027 through 2028. For the rest of 2026, the OpenAI API continues running on existing Nvidia GPU clusters. Developers will not notice a change this year.
The path to cheaper API calls runs through two stages. First, Jalapeño needs to reach production-validated performance numbers that match the lab testing claims. Second, OpenAI needs to route live API traffic through Jalapeño endpoints at scale. Both happen in 2027.
GPT-4o pricing has already fallen dramatically since launch — from $30 per million tokens in mid-2023 to under $2.50 today. Jalapeño extends the margin that enables further cuts. A 30-40% API price reduction in H1 2027 is plausible if production performance validates the lab claims. Track current costs across all providers with the LLM API Pricing Tracker.
Agentic and Codex workloads benefit first. These are inference-intensive and latency-sensitive — sequential inference calls per task with tight latency budgets. If Jalapeño's memory bandwidth advantage translates to lower token generation latency alongside the cost reduction, Codex agents get faster and cheaper simultaneously.
Our Analysis: The Hardware Autonomy Play
The 50% inference cost reduction is the headline number, but it carries a caveat: early lab testing under controlled LLM workloads is not the same as production performance across the full distribution of ChatGPT queries, which range from single-token yes/no responses to 100,000-token document processing.
The nine-month development timeline is the more durable signal.
If OpenAI and Broadcom can repeatably tape out production-grade AI ASICs on a nine-month cycle, the implications extend past inference costs. Model architecture decisions that once had to be backward-compatible with hardware built two years earlier could instead coevolve with new chips. OpenAI could design chips tuned to specific model generations — what Jalapeño's successor might look like if it is optimized for whatever architecture powers GPT-6 rather than what GPT-5 requires.
The geopolitical dimension is also worth noting. Jalapeño technology and manufacturing cannot be exported to China under current US export controls. This means the inference cost gap between US-operated AI infrastructure and Chinese-operated AI infrastructure — which is building on Huawei Ascend and domestic ASIC programs — will likely widen over the next five years. China's $295 billion AI data center program is betting on domestic silicon because custom US chips like Jalapeño will remain unavailable to Chinese operators regardless of price.
For developers deciding which API to build on, Jalapeño strengthens the OpenAI infrastructure value case. The current model comparison for developers will shift materially if API prices fall 30-40% in 2027 as the chip ramps.
Key Takeaways
- 50% lower inference cost per token than Nvidia GPUs, per Broadcom CEO statements to Reuters and Bloomberg — based on early lab testing, not production benchmarks
- 9 months from design spec to tape-out — claimed as fastest high-performance ASIC development cycle on record
- TSMC-manufactured, OpenAI-designed, Broadcom-partnered — same foundry chain as Apple M-series and Nvidia Blackwell
- 10 gigawatts of Jalapeño compute targeted across OpenAI and Microsoft data centers by 2029
- Inference only — not for training workloads, not for external sale; benefit comes via cheaper OpenAI API calls
- For developers: watch for OpenAI API price cuts in H1 2027 as Jalapeño production ramps; Codex and agentic pipelines benefit most from lower per-token cost
- What to watch: final benchmark comparisons against Blackwell B200 and Google TPU v5e, expected in late 2026 alongside prototype deployments
FAQ
Frequently Asked Questions
What is the OpenAI Jalapeño chip?
Jalapeño is a custom ASIC designed by OpenAI and manufactured by Broadcom and TSMC for LLM inference. Unveiled on June 24, 2026, it claims roughly 50% lower cost per token than Nvidia GPUs and will power ChatGPT and the OpenAI API starting in late 2026. It is not for external sale.
Can developers buy or access the OpenAI Jalapeño chip?
No. Jalapeño is designed exclusively for internal use in OpenAI data centers and will not be sold or offered via any cloud marketplace. Developers benefit indirectly through lower OpenAI API pricing and potentially faster response times as the chip ramps in 2027 and 2028.
How does Jalapeño compare to Nvidia GPUs for inference?
For LLM inference specifically, early testing shows Jalapeño delivers performance per watt on par with Nvidia Blackwell at roughly 50% lower cost per token. For training workloads, Jalapeño does not apply — OpenAI continues using Nvidia and Azure GPU clusters for all model training.
Why did OpenAI partner with Broadcom to build a custom chip?
OpenAI spent an estimated $5 billion on inference compute in 2025. Custom silicon tuned for transformer inference achieves substantially better efficiency than general-purpose GPUs. Broadcom provided silicon design expertise and TSMC access that allowed OpenAI to go from design to tape-out in nine months rather than the typical 18-24 months.
When will OpenAI Jalapeño chips be deployed and affect API pricing?
Small prototype deployments are targeted for end of 2026, with full production ramp in 2027 and 2028. API price reductions tied to Jalapeño cost savings are most likely in H1 2027. The next-generation chip is already planned for 2028, and 10 gigawatts of total Jalapeño compute is targeted by 2029.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
More on OpenAI
All posts →Nvidia June 5: How the Largest AI Chip Company Amplifies Corrections
Nvidia fell harder than Nasdaq on June 5 as Broadcom dropped 15% on AI demand concerns. Here is why NVDA amplifies AI corrections and the investor outlook.
OpenAI Closes $122B Round at $852B Valuation — Amazon's Hidden AGI Clause Explained
OpenAI closed a $122B funding round on March 31 at $852B valuation. Amazon's $35B is contingent on IPO or AGI by 2028. What this means for developers and the API ecosystem.
Nvidia Installs 16-GPU Data Centers on Homes: XFRA Explained
Span and Nvidia are deploying XFRA nodes — liquid-cooled boxes with 16 RTX Pro 6000 Blackwell GPUs — on residential homes, tapping unused grid capacity. 100 pilot units launch Q3 2026 on PulteGroup builds.
China Plans $295B AI Grid Mandating 80% Domestic Chips by 2028
China plans a $295B national AI data center grid by 2028 with 80% domestic chips mandated. Nvidia and AMD are locked out; Huawei and Biren are positioned to supply.
Free Tool
Will AI replace your job?
4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.
Check Your AI Risk Score →Written by
Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 969+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.
