NVIDIA Nemotron 3 Super: 60% SWE-bench, Best Open Model for Code
Quick summary
NVIDIA Nemotron 3 Super hits 60.47% on SWE-bench — highest open-weight score ever. 120B total, 12B active, 1M context, 5x throughput vs GPT-OSS. Already in CodeRabbit and Greptile.
Read next
- PewDiePie Launches Odysseus: Free Self-Hosted AI vs ChatGPTFelix Kjellberg shipped Odysseus on May 31, 2026 — an open-source, Docker-ready AI workspace with 270+ local models. What developers should know before the hype.
- OpenCode: 160K GitHub Stars, 7.5M Developers — The AI Coding Agent Dominating June 2026OpenCode hits 160,000 GitHub stars and 7.5 million monthly active developers in June 2026. Model-agnostic, LSP-integrated, air-gapped capable — how it compares to Cursor and Claude Code.
NVIDIA released Nemotron 3 Super on March 11, 2026. The headline number is 60.47% on SWE-bench Verified — the highest score any open-weight model has achieved on the benchmark that tests AI's ability to resolve real GitHub issues.
For context: Claude Opus 4.6 scores 80.8% on the same benchmark and GPT-5.4 scores around 75%. Nemotron 3 Super is not beating frontier closed models. What it is doing is beating every other model you can download and run yourself — by a meaningful margin.
This matters because open-weight models have a fundamentally different value proposition than API-accessed models. No per-token costs. No data leaving your infrastructure. No rate limits on private codebases. If Nemotron 3 Super can handle 60% of real GitHub issues autonomously, that's a capable autonomous coding agent you can run on your own hardware.
The Architecture: Why Mamba-Transformer Hybrid Is Different
Most large language models use transformer attention, which has quadratic computational complexity relative to sequence length. Longer context = dramatically higher cost. This is why 1M-token context windows are expensive to run even when the model technically supports them.
Nemotron 3 Super uses a hybrid architecture: interleaved Mamba-2 layers, Mixture-of-Experts (MoE) layers, and select transformer attention layers. The Mamba-2 backbone uses linear-time sequence processing — compute cost scales linearly with context length, not quadratically.
The result: the model can process a 1M-token context window at a cost that doesn't explode the way pure transformer attention would. For developers trying to run codebase-wide analysis, this is the practical difference between "can do this with a single H100" and "needs a multi-GPU cluster."
The Scale Numbers and What They Mean
Nemotron 3 Super has 120 billion total parameters and 12 billion active parameters. The gap between those two numbers is the MoE architecture at work. In each forward pass, only 12B of the 120B parameters activate — the router selects which expert sub-networks handle each token. You get near-120B model quality at roughly 12B inference compute cost.
For throughput comparison:
- 5x higher throughput than GPT-OSS-120B (a comparable-scale open-weight model)
- 7.5x higher throughput than Qwen3.5-122B
- 2-3x wall-clock speedup on structured generation like code and tool calls, via built-in speculative decoding
These are not marginal differences. A 5x throughput advantage means you can run 5 parallel coding agents for the same hardware budget that would run one GPT-OSS-120B agent. For agentic coding workflows where you want multiple parallel code-review or debugging passes, this compounds.
SWE-bench 60.47%: What the Number Actually Means
SWE-bench Verified is one of the most rigorous benchmarks for AI coding capability. It presents real GitHub issues from popular Python repositories — the same issues real contributors resolved — and asks the model to produce a patch that passes the test suite.
There's no memorization shortcut available. The issues are from real production codebases. The test suite validates whether the patch actually fixes the problem, not just whether it looks correct.
60.47% means Nemotron 3 Super resolves more than 3 in every 5 real GitHub issues autonomously. Among open-weight models, the previous best was in the high 50s. Among all models including closed frontier systems, 60.47% sits meaningfully below Claude Opus 4.6 (80.8%) and GPT-5.4 (~75%), but it's no longer in a different category from them.
The practical implication: for code review automation, bug triage, and greenfield feature implementation in constrained contexts, Nemotron 3 Super is capable enough that the "good enough" bar for a self-hosted solution has been crossed.
The Inference Stack: How to Access It
NVIDIA has made Nemotron 3 Super available through several routes, from zero-setup cloud inference to full self-hosted deployment.
Managed inference (no setup required):
- Perplexity Labs API — callable via standard OpenAI-compatible endpoint
- OpenRouter — aggregated access alongside other models
- build.nvidia.com — NVIDIA's own NIM (NVIDIA Inference Microservice) endpoint
Self-hosted:
- HuggingFace model hub — full weights available for download
- NVIDIA NIM container — Docker-compatible deployment with built-in speculative decoding already configured
For most developers evaluating the model, starting with OpenRouter or build.nvidia.com is the fastest path to a working prototype before committing to the infrastructure investment of self-hosting a 120B-parameter model.
Already Integrated: CodeRabbit, Factory, Greptile
Three coding tools have already shipped integrations with Nemotron 3 Super as of March 2026:
CodeRabbit — AI code review tool that comments on pull requests. Nemotron 3 Super is now available as a code review engine alongside Claude Opus 4.6 and GPT-5.4. The throughput advantage means faster PR turnaround at lower cost for high-volume repositories.
Factory — agentic coding platform that implements feature requests end-to-end. Nemotron 3 Super runs as an agent backbone for implementation tasks where users want to self-host the model rather than route through external APIs.
Greptile — codebase Q&A and search tool. Nemotron 3 Super's 1M-token context window is particularly relevant here: Greptile needs to load large code contexts to answer questions about complex codebases, and linear-time sequence processing makes that economical at scale.
The fact that these tools shipped integrations within days of the model release signals that the performance is real and reproducible outside of NVIDIA's own benchmarking environment.
The Caveats Developers Should Know
Training data cutoff. Pre-training data has a cutoff of June 2025. Post-training (instruction tuning) data has a cutoff of February 2026. For code in repositories that evolved significantly after mid-2025, the model may not be aware of new APIs, breaking changes, or community patterns introduced after that date.
English-primary. The model was trained on English and 19 other languages, with 43 programming languages. If your codebase has extensive non-English comments or documentation, performance degrades from the benchmark numbers.
Mamba-2 and attention layer interaction. The hybrid architecture is newer than pure transformers and less battle-tested across diverse deployment configurations. Some inference frameworks don't yet fully optimize for Mamba-2 layers. Benchmark the model on your specific workload before building a production pipeline around benchmark numbers.
Open-weight is not fully open-source. The weights are downloadable, but the training code and full dataset composition are not released. You can run and fine-tune Nemotron 3 Super, but you cannot reproduce the training run.
Key Takeaways
- Nemotron 3 Super scores 60.47% on SWE-bench Verified — the highest open-weight result on record
- 120B total parameters, 12B active via MoE — near-120B quality at 12B inference compute cost
- 1M-token context window with linear-time processing via Mamba-2 backbone
- 5x throughput vs GPT-OSS-120B, 7.5x vs Qwen3.5-122B, 2-3x speedup on code generation
- Available via Perplexity, OpenRouter, build.nvidia.com, and HuggingFace today
- Already integrated into CodeRabbit, Factory, and Greptile
- Best use cases: self-hosted code review automation, agentic coding in private codebases, codebase-wide Q&A
- Caveats: training cutoff June 2025 (pre-training), no full open-source training code, hybrid Mamba architecture less tested in diverse deployment configs
FAQ
Frequently Asked Questions
What is NVIDIA Nemotron 3 Super and what is its benchmark score?
NVIDIA Nemotron 3 Super is a 120B total parameter, 12B active parameter hybrid Mamba-Transformer MoE model released March 11, 2026. It scores 60.47% on SWE-bench Verified — the highest score any open-weight model has achieved on the benchmark, which tests ability to resolve real GitHub issues. Frontier closed models like Claude Opus 4.6 (80.8%) and GPT-5.4 (~75%) still score higher.
How does Nemotron 3 Super compare to GPT and Qwen on speed?
Nemotron 3 Super achieves 5x higher throughput than GPT-OSS-120B and 7.5x higher throughput than Qwen3.5-122B on equivalent hardware. It also delivers a 2-3x wall-clock speedup on structured generation like code and tool calls through built-in speculative decoding.
Can developers run Nemotron 3 Super themselves?
Yes. The weights are available on HuggingFace and through build.nvidia.com. Managed inference is available via Perplexity Labs, OpenRouter, and build.nvidia.com with no setup required. For self-hosted deployment, NVIDIA provides a Docker-compatible NIM container with speculative decoding pre-configured.
What AI coding tools already use Nemotron 3 Super?
CodeRabbit (AI code review on pull requests), Factory (agentic coding platform), and Greptile (codebase Q&A) all shipped Nemotron 3 Super integrations within days of the March 11, 2026 release. The rapid third-party integration indicates the benchmark performance holds outside of NVIDIA's own test environment.
What are the main limitations of Nemotron 3 Super?
Pre-training data has a June 2025 cutoff, so the model may not know APIs or patterns introduced after mid-2025. Open-weight means the weights are downloadable but the training code is not released, unlike fully open-source models. The hybrid Mamba-Transformer architecture is newer and less tested across diverse deployment configurations than pure transformers.
Free Weekly Briefing
The AI & Dev Briefing
One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.
No spam. Unsubscribe anytime.
More on AI Models
All posts →PewDiePie Launches Odysseus: Free Self-Hosted AI vs ChatGPT
Felix Kjellberg shipped Odysseus on May 31, 2026 — an open-source, Docker-ready AI workspace with 270+ local models. What developers should know before the hype.
OpenCode: 160K GitHub Stars, 7.5M Developers — The AI Coding Agent Dominating June 2026
OpenCode hits 160,000 GitHub stars and 7.5 million monthly active developers in June 2026. Model-agnostic, LSP-integrated, air-gapped capable — how it compares to Cursor and Claude Code.
OpenAI GPT-5.5 Released: Agentic Coding and Multi-Step Reasoning Upgrade
OpenAI released GPT-5.5 on April 23-24 2026. Stronger agentic coding, multi-step reasoning chains. Rolling to ChatGPT Plus, Pro, Enterprise. API access coming soon.
DeepSeek V4 Pro: 1.6T Parameters, Beats Claude on Coding, Open-Source
DeepSeek V4 Pro released April 2026: 1.6T parameters, 1M token context, Terminal-Bench 67.9% vs Claude 65.4%, LiveCodeBench 93.5% vs 88.8%, SWE-bench 80.6%. Fully open-source.
Written by
Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 952+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.
