Amazon Kills AI Leaderboard After Engineers Inflate Token Bills

Abhishek GautamAbhishek Gautam9 min read
Amazon Kills AI Leaderboard After Engineers Inflate Token Bills

Quick summary

Kirorank ranked staff on AI usage until tokenmaxxing spiked compute spend on $200B capex year. Amazon now tracks shipped code, not tokens.

Amazon took its internal Kirorank AI leaderboard offline on May 29, 2026, after employees inflated usage scores by running pointless tasks through AI agents, a practice staff called tokenmaxxing. The beta dashboard tracked activity on Amazon's Kiro developer platform and had been tied to pressure for more than 80% of developers to use AI weekly, while Amazon plans roughly $200 billion in 2026 capital spending mostly on AI and data centers. Senior vice president Dave Treadwell told staff not to use AI for its own sake. Amazon now emphasizes normalized deployments, AI-assisted code that actually ships, not raw token burn.

What was Kirorank and why did Amazon kill it?

Kirorank was an internal scoring system that ranked employees by AI tool usage on Kiro, Amazon's AI-forward developer environment. Financial Times reporting, cited widely on May 29, said workers gamed the board by assigning low-value work to agents via Kiro, MeshClaw, and related internal tools to climb rankings. That behavior raised cloud compute bills without improving products.

Amazon confirmed the dashboard was not a formal or approved tool and has been deprecated. The company framed shutdown as cost control and anti-gaming, not a retreat from AI adoption.

What is tokenmaxxing inside enterprises?

Tokenmaxxing is gamifying LLM usage metrics: running verbose refactors, auto-replying to low-priority email, or spawning agent loops whose output nobody merges, solely because a leaderboard rewards token volume. It mirrors social media engagement hacking, but the spend hits real GPU budgets.

Meta reportedly saw a similar pattern when internal AI usage scores became career signals. Amazon's response, normalized deployments, is the right leading indicator: did AI help ship vetted code to production?

The $200B capex tension behind the headline

Amazon's 2026 capex story is dominated by AI infrastructure, the same macro pressure behind Amazon's multi-billion Anthropic and Trainium bets. Letting tens of thousands of engineers inflate tokens for leaderboard points is how you turn a strategic investment into an operating expense fire.

Finance and platform teams will export this lesson: never publish a single-metric AI adoption KPI without guardrails on productive output.

What normalized deployments means for engineering managers

Normalized deployments measure how often developers use AI to produce useful, merged code, not how many tokens they consume in Slack or email triage. That metric is harder to game and closer to value.

For individual contributors, the implication is blunt: your org will notice if Copilot, Kiro, or Claude Code sessions do not correlate with PRs that pass review. Vanity agent hours are now a cost center risk.

Lessons for every team running AI coding tools

Do not rank engineers on raw token usage. If you must track adoption, pair usage with merge rate, defect rate, and cycle time.

Cap autonomous agent loops in CI and internal bots unless outputs attach to tickets with owners.

Chargeback tokens to cost centers so teams see marginal cost of leaderboard chasing.

Use the LLM API pricing tracker at /tools/llm-api-pricing to model spend before internal competitions go viral.

Align with Amazon's public caution: Treadwell's do not use AI for the sake of AI line is the enterprise version of don't ship microservices for resume-driven architecture.

Connection to broader AI economics

Uber reportedly exhausted its 2026 AI budget by April with little consumer-facing impact, per News18 summaries of industry reporting. Amazon's leaderboard shutdown is the same story at different scale: usage without outcomes is unsustainable when inference is metered.

Anthropic's $965B valuation and OpenAI's $852B cap table assume revenue scales with productive use, not vanity tokens. Enterprise buyers will demand normalized deployments style metrics in vendor reviews within a year.

Key Takeaways

  • May 29, 2026: Amazon deprecated Kirorank, an internal AI usage leaderboard tied to Kiro
  • Tokenmaxxing inflated scores via pointless agent tasks, spiking compute costs
  • Amazon targets 80%+ weekly developer AI use while planning ~$200B 2026 capex weighted to AI
  • New success metric: normalized deployments (useful shipped code) replaces raw usage leaderboards
  • For developers: expect employers to gate AI metrics on production outcomes, not token volume
  • What to watch: whether AWS productizes deployment-quality analytics for enterprise customers

Frequently asked questions

What is Amazon Kirorank?

Kirorank was an internal beta dashboard that scored Amazon employees on AI activity on the Kiro developer platform. Amazon shut it down on May 29, 2026, after workers gamed it through tokenmaxxing.

What is tokenmaxxing at Amazon?

Tokenmaxxing means running low-value tasks through AI agents primarily to increase usage scores and leaderboard rank, which raised compute spending without improving products.

Why did Amazon remove the AI leaderboard?

Amazon said the tool was not formal or approved, and it encouraged misuse that increased costs. The company shifted focus to normalized deployments measuring useful code output instead.

What did Dave Treadwell tell employees?

Treadwell reportedly urged staff not to use AI just for the sake of using it, acknowledging the leaderboard had good intentions but created extra cost and perverse incentives.

What should engineering teams learn from Kirorank?

Do not reward raw token usage. Tie AI adoption metrics to merged code, incident rates, and cycle time, and cap agent automation that lacks ticket owners.

FAQ

Frequently Asked Questions

What was Amazon Kirorank?

Kirorank was an internal beta leaderboard scoring employee AI usage on Amazon's Kiro platform. Amazon deprecated it on May 29, 2026, after tokenmaxxing inflated usage and costs.

What is tokenmaxxing?

Tokenmaxxing is gaming AI usage metrics by running low-value agent tasks to burn tokens and climb rankings without shipping useful code. Amazon staff used it on Kirorank before the dashboard was removed.

Why did Amazon shut down the AI leaderboard?

Amazon said the beta dashboard was not approved, encouraged costly misuse, and distracted from productive AI adoption. Leadership shifted to normalized deployments as a success metric.

How much is Amazon spending on AI infrastructure?

Reporting in May 2026 cited roughly $200 billion in planned 2026 capital expenditure, with most directed toward AI systems and data center expansion.

What metric replaces Kirorank?

Amazon is focusing on normalized deployments, measuring how often AI helps produce useful code that reaches production, rather than raw token or activity scores.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

Free Tool

Will AI replace your job?

4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.

Check Your AI Risk Score →

Written by

Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 952+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.