OpenAI AI Chips Broadcom Nvidia Semiconductors AI Infrastructure Machine Learning

OpenAI Unveils Jalapeño: First Custom AI Chip

Abhishek GautamJune 24, 202611 min read

OpenAI Unveils Jalapeño: First Custom AI Chip

Quick summary

OpenAI and Broadcom unveiled Jalapeño on June 24, 2026 — a custom LLM inference ASIC manufactured by TSMC in nine months, claiming 50% lower cost per token than Nvidia GPUs.

What Is the OpenAI Jalapeño Chip?

Jalapeño is a custom application-specific integrated circuit designed exclusively for running large language model inference. It handles the process of generating text outputs in response to user queries — what happens every time someone sends a message to ChatGPT or calls the OpenAI API.

Unlike Nvidia GPUs, which carry compute capacity optimized across training, inference, rendering, and scientific simulation, Jalapeño is tuned entirely around the memory-movement patterns, kernel execution, and networking requirements of transformer-based models. That specialization is the source of the cost reduction. Silicon that does one job efficiently costs less per operation than silicon designed to do everything adequately.

The chip is manufactured on TSMC process nodes — the same foundry that makes Apple's A-series and M-series chips, Nvidia's Hopper and Blackwell GPUs, and Google's TPUs. Jalapeño will not be sold externally. It is OpenAI hardware, built for OpenAI infrastructure.

The 9-Month Development Record

From architecture specification to manufacturing tape-out in nine months is genuinely fast. Production HPC silicon typically takes 18 to 24 months through the full ASIC development pipeline. Broadcom has completed shorter timelines for mobile and networking chips, but those are far less architecturally complex than a chip targeting LLM inference performance at data-center scale.

Two factors enabled the speed. First, Broadcom brought an existing LLM-optimized silicon platform that OpenAI could build on rather than starting from first principles. Second, and per OpenAI's own announcement, the company used its AI models to accelerate chip design — specifically for verification, layout optimization, and RTL generation. The irony is deliberate: AI-designed hardware runs AI inference.

If that methodology repeats at scale, it matters beyond this chip. The hardware iteration cycle in AI has been a structural bottleneck: model capability advances quarterly, but hardware takes two years to follow. A nine-month ASIC cycle, if it holds, closes that gap substantially and gives OpenAI a tighter loop between model architecture decisions and the silicon they run on.

Broadcom and TSMC: How the Partnership Works

Broadcom acts as the silicon design and manufacturing integration partner. OpenAI defines the architecture, memory hierarchy, and inference-specific compute requirements. Broadcom translates that into physical silicon design and coordinates manufacturing with TSMC. OpenAI retains ownership of the resulting IP.

This is structurally similar to how Google built its Tensor Processing Units and how Apple develops M-series chips. The difference is that Google and Apple built their silicon programs over years. OpenAI got there in nine months on what Broadcom is calling a multi-generation program — a roadmap, not a one-off.

The announced target is 10 gigawatts of Jalapeño-based compute deployed across OpenAI and Microsoft data centers by 2029. The next-generation chip is already planned for 2028. OpenAI is building a hardware program with the same cadence expectations as its model program.

How Jalapeño Compares to Current Inference Hardware

Chip	Type	Inference Cost vs H200	Training Use	External Sale
Nvidia H200	GPU	Baseline	Yes	Yes
Nvidia B200 (Blackwell)	GPU	Broadly similar	Yes	Yes
Google TPU v5e	TPU	Competitive	Limited	GCP only
Amazon Inferentia 2	ASIC	~30-40% below GPU	Limited	AWS only
OpenAI Jalapeño	ASIC	~50% below H200	No	No

The 50% figure applies to LLM inference workloads specifically. Broadcom CEO Hock Tan told Bloomberg the chip delivers performance per watt on par with Nvidia Blackwell at roughly 50% lower cost per token. These are pre-production numbers from early lab testing. Final validated benchmarks against Blackwell B200 and Google TPU v5e have not been published.

For training runs, the comparison is irrelevant. Jalapeño was designed for inference. OpenAI's training infrastructure continues running on Nvidia GPU clusters.

What Jalapeño Does to Nvidia

Broadcom (AVGO) rose roughly 2% on the announcement. Nvidia (NVDA) fell 0.26%. The muted reaction reflects a real assessment: Jalapeño is an inference chip. Nvidia's dominant position in AI training — the bigger contract value — is untouched.

But the direction matters. Every major foundation model lab is now running or building custom inference silicon:

Google: TPU v5 and v6 family
Meta: MTIA inference chip, deployed at scale
Amazon: Inferentia 2 and Trainium
Microsoft: Maia AI accelerator
Apple: Neural Engine and A-series silicon
OpenAI: Jalapeño (announced June 24, 2026)

Nvidia's total addressable inference market is under pressure from every direction simultaneously. The company's response will likely intensify around NVLink networking, the CUDA software ecosystem, and multi-modal workloads where specialized inference ASICs offer no advantage — fine-tuning, vision tasks, audio processing.

When Will Developers See Cheaper API Costs?

Jalapeño enters small prototype deployments by end of 2026. Full production ramp is 2027 through 2028. For the rest of 2026, the OpenAI API continues running on existing Nvidia GPU clusters. Developers will not notice a change this year.

The path to cheaper API calls runs through two stages. First, Jalapeño needs to reach production-validated performance numbers that match the lab testing claims. Second, OpenAI needs to route live API traffic through Jalapeño endpoints at scale. Both happen in 2027.

GPT-4o pricing has already fallen dramatically since launch — from $30 per million tokens in mid-2023 to under $2.50 today. Jalapeño extends the margin that enables further cuts. A 30-40% API price reduction in H1 2027 is plausible if production performance validates the lab claims. Track current costs across all providers with the LLM API Pricing Tracker.

Agentic and Codex workloads benefit first. These are inference-intensive and latency-sensitive — sequential inference calls per task with tight latency budgets. If Jalapeño's memory bandwidth advantage translates to lower token generation latency alongside the cost reduction, Codex agents get faster and cheaper simultaneously.

Our Analysis: The Hardware Autonomy Play

The 50% inference cost reduction is the headline number, but it carries a caveat: early lab testing under controlled LLM workloads is not the same as production performance across the full distribution of ChatGPT queries, which range from single-token yes/no responses to 100,000-token document processing.

The nine-month development timeline is the more durable signal.

If OpenAI and Broadcom can repeatably tape out production-grade AI ASICs on a nine-month cycle, the implications extend past inference costs. Model architecture decisions that once had to be backward-compatible with hardware built two years earlier could instead coevolve with new chips. OpenAI could design chips tuned to specific model generations — what Jalapeño's successor might look like if it is optimized for whatever architecture powers GPT-6 rather than what GPT-5 requires.

The geopolitical dimension is also worth noting. Jalapeño technology and manufacturing cannot be exported to China under current US export controls. This means the inference cost gap between US-operated AI infrastructure and Chinese-operated AI infrastructure — which is building on Huawei Ascend and domestic ASIC programs — will likely widen over the next five years. China's $295 billion AI data center program is betting on domestic silicon because custom US chips like Jalapeño will remain unavailable to Chinese operators regardless of price.

For developers deciding which API to build on, Jalapeño strengthens the OpenAI infrastructure value case. The current model comparison for developers will shift materially if API prices fall 30-40% in 2027 as the chip ramps.

Key Takeaways

50% lower inference cost per token than Nvidia GPUs, per Broadcom CEO statements to Reuters and Bloomberg — based on early lab testing, not production benchmarks
9 months from design spec to tape-out — claimed as fastest high-performance ASIC development cycle on record
TSMC-manufactured, OpenAI-designed, Broadcom-partnered — same foundry chain as Apple M-series and Nvidia Blackwell
10 gigawatts of Jalapeño compute targeted across OpenAI and Microsoft data centers by 2029
Inference only — not for training workloads, not for external sale; benefit comes via cheaper OpenAI API calls
For developers: watch for OpenAI API price cuts in H1 2027 as Jalapeño production ramps; Codex and agentic pipelines benefit most from lower per-token cost
What to watch: final benchmark comparisons against Blackwell B200 and Google TPU v5e, expected in late 2026 alongside prototype deployments

FAQ

Frequently Asked Questions

What is the OpenAI Jalapeño chip?

Jalapeño is a custom ASIC designed by OpenAI and manufactured by Broadcom and TSMC for LLM inference. Unveiled on June 24, 2026, it claims roughly 50% lower cost per token than Nvidia GPUs and will power ChatGPT and the OpenAI API starting in late 2026. It is not for external sale.

Can developers buy or access the OpenAI Jalapeño chip?

No. Jalapeño is designed exclusively for internal use in OpenAI data centers and will not be sold or offered via any cloud marketplace. Developers benefit indirectly through lower OpenAI API pricing and potentially faster response times as the chip ramps in 2027 and 2028.

How does Jalapeño compare to Nvidia GPUs for inference?

For LLM inference specifically, early testing shows Jalapeño delivers performance per watt on par with Nvidia Blackwell at roughly 50% lower cost per token. For training workloads, Jalapeño does not apply — OpenAI continues using Nvidia and Azure GPU clusters for all model training.

Why did OpenAI partner with Broadcom to build a custom chip?

OpenAI spent an estimated $5 billion on inference compute in 2025. Custom silicon tuned for transformer inference achieves substantially better efficiency than general-purpose GPUs. Broadcom provided silicon design expertise and TSMC access that allowed OpenAI to go from design to tape-out in nine months rather than the typical 18-24 months.

When will OpenAI Jalapeño chips be deployed and affect API pricing?

Small prototype deployments are targeted for end of 2026, with full production ramp in 2027 and 2028. API price reductions tied to Jalapeño cost savings are most likely in H1 2027. The next-generation chip is already planned for 2028, and 10 gigawatts of total Jalapeño compute is targeted by 2029.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.