- Enterprise AI Executive
- Posts
- Enterprise AI coding case studies
Enterprise AI coding case studies
Plus, key takeaways to help you level up fast.
Welcome executives and professionals. For years, many asked the question: Is AI actually delivering ROI, or is this hype?
Then AI coding emerged as the breakthrough enterprise use case. But the gap between AI leaders and laggards is compounding fast.
So I reviewed hundreds of enterprise agentic engineering case studies and distilled the key takeaways behind what’s driving value at scale.
Goldman Sachs - Devin (autonomous coding agent)
3–4× productivity gain vs prior AI tools; IPO prospectus in minutes vs 2 weeks for 6-person team (95% completion rate)
Deployed autonomous AI coders alongside ~12,000 human developers. CTO Marco Argenti positioned agents as 'digital employees' handling full software lifecycles: writing, debugging, and deploying. Started with hundreds of agents, scaling to thousands.
Spotify - Honk (internal agent, built on Claude Code)
~90% reduction in engineering time; 650+ AI-generated code changes shipped/month; ~50% of all Spotify updates now AI-generated
Senior engineers delegate complete coding tasks to AI agents (built on Claude Code) while retaining oversight on architecture and review. Shifted the bottleneck from engineering capacity to 'the amount of change consumers are comfortable with.'
Microsoft + Accenture + Fortune 100 (RCT) - GitHub Copilot
26.08% increase in completed tasks across 4,867 developers (randomized controlled trial); junior developers showed the highest gains.
MIT-published field experiments with random assignment of AI coding assistants. Measured via GitHub pull requests and commits as proxies for completed units of work. Confirmed causal (not just correlational) productivity effects at scale.
Shopify - GitHub Copilot + Cursor + Claude Code (multi-tool stack)
CEO cites 100× output on select tasks; headcount held flat despite growth; AI usage baked into performance reviews
April 2025 all-hands mandate: all teams must demonstrate why AI cannot do the job before requesting headcount. Pre-tooled infrastructure: internal LLM proxy, 24+ MCP servers, Copilot/Cursor/Claude Code. CEO Tobi Lütke personally ships code again using agentic coding tools.
Cognizant - Claude for Enterprise + Claude Code
AI-assisted coding available for 350,000 associates; accelerated coding, testing, documentation, and DevOps workflows at enterprise scale.
Aligned engineering platforms with Anthropic capabilities including Claude Code, MCP, and the Agent SDK. Enables clients to integrate AI with existing data/apps, orchestrate multi-step work with human oversight, and manage performance, risk, and spend.
Accenture - GitHub Copilot
55% faster task completion in controlled experiment; 73% of users report completing tasks faster; average 14 min/day saved (22% save 30+ min/day)
Extensive randomized controlled trial with Accenture developers across engineering, design, and testing. Used pull requests and builds as outcome proxies. Showed gains across all experience levels, with statistical controls for confounders.
NVIDIA - OpenAI Codex (GPT-5.5)
10,000+ NVIDIANs using Codex across engineering, product, legal, HR and sales; debugging cycles from days to hours; weeks of experimentation now overnight.
NVIDIA IT provisioned dedicated cloud VMs per employee as secure sandboxes, with zero-data-retention policy and read-only CLI access to production systems via "Skills" automation workflows — giving Codex structured, auditable access at enterprise scale.
Amazon (AWS internal) - Amazon Q Developer + internal AI tooling
15.9% year-over-year reduction in total cost of delivering software units (not just coding speed—full SDLC measurement)
Applied Theory of Constraints to find that accelerating coding alone doesn't move the system. Measured total economic cost per software delivery unit. Used Amazon Q Developer for agentic coding in VS Code and JetBrains, plus automated test creation and build error resolution.
Virgin Atlantic - OpenAI Codex
78–80% reduction in codebase size on legacy refactors; 2-week pieces of work now take 30 minutes; zero P1 defects at mobile app launch on a fixed holiday deadline.
Codex deployed across mobile app delivery, legacy refactoring, and database migrations. Paired with an internal AI champions network and Cambridge Spark AI apprentices. Framework: train first, guardrail second, iterate continuously.
Coinbase - Cursor + GitHub Copilot + Claude Code
40%+ of all daily code now AI-generated (doubled since April 2025); single engineers now refactoring or building new codebases in days instead of months.
CEO mandated adoption company-wide with a hard one-week deadline. DevX team built MCP servers (GitHub, Linear) and standardised Cursor rules. Tracks adoption monthly via lead-time-to-change, deployment frequency, and AI vs human-generated code ratio.
Google - Internal LLM tools (Gemini / proprietary)
LLMs significantly reduced time for large-scale legacy migrations across Google Ads, Search, Workspace, and YouTube; 30%+ of all new Google code now AI-generated (up from 25% six months prior).
Peer-reviewed at ICSE 2025. Combines LLM-based change location discovery with automated code generation, applied to 20+ year-old monolithic codebases — directly analogous to Fortune 500 legacy modernisation challenges.
TELUS - Claude (via internal Fuel iX platform)
57,000 employees given direct AI workflow access; developers use Claude Code within VS Code and GitHub for real-time refactoring; pilot programs report 30% faster PR turnaround
Built internal Fuel iX platform with Claude as core engine. Integrated across developer, analyst, and support teams via unified hub. Claude Code embedded in IDEs for real-time code refactoring. Enterprise admin panel enables selective premium capacity allocation by role.
Forrester TEI — composite Fortune 500 (5,000 devs)
$48.3M in developer productivity gains + $18.4M in revenue impact over 3 years (Forrester Total Economic Impact study)
Commissioned by Microsoft/GitHub, Forrester modeled a composite organization of 5,000 developers. Measured time saved per developer per week, defect reduction, and accelerated feature delivery. Adoption took ~11 weeks to reach full productivity gains.
Bank of America - Internal AI coding agents (proprietary)
18,000 developers using coding agents; 20%+ developer efficiency gains; 50%+ reduction in IT service desk calls; 90% of 210,000+ employees on AI tools.
Proprietary GenAI coding assistant deployed across all developers for code writing and optimisation. ROI tracked systematically and reported directly to investors at earnings calls. Part of a broader AI stack including Erica for Employees (equivalent to ~11,000 FTEs).
Anthropic (internal)
Engineers shifted from debugging to implementing features (feature implementation share: 14% → 37% of coding tasks); task complexity rose from 3.2 → 3.8/5; consecutive autonomous tool calls doubled
Tracked Claude Code usage across Anthropic engineers from Feb–Aug 2025. Coded transcripts into task types, measured complexity, and surveyed engineers. Found increasing delegation of autonomy over time—engineers moving up the stack to design/architecture while Claude handles implementation.
Salesforce - Internal AI tools + Agentforce
30% increase in PR velocity; 30% reduction in cycle time; 30 million lines of AI-generated code in production over two years; engineering hiring frozen for all of 2025.
Built internal platform "Prizm" with standardised prompt templates and migration blueprints. AI deployed across the full SDLC — code generation, test case generation, incident detection, and post-incident learning — tracked via PR velocity and cycle time.
Meta - TestGen-LLM (proprietary, internal)
73% of AI-generated test improvements accepted for production; 25% increase in test coverage on Instagram Reels and Stories; 11.5% of all targeted classes improved across Instagram and Facebook test-a-thons.
Built on "Assured Offline LLMSE" methodology: LLM-generated tests must clear a multi-stage filter pipeline before surfacing to engineers — hallucination eliminated architecturally, not just warned against. Peer-reviewed at FSE 2024 (ACM industry track).
BT Group - Amazon Q Developer
100,000+ lines of code generated in first 4 months; 12% of repetitive engineering work automated; 37% code suggestion acceptance rate across 1,200 engineers.
Piloted with a volunteer cohort, measured hard metrics before full rollout. Deployed within a formal responsible tech guardrail framework covering IP compliance, data privacy, and accountability.
Citigroup - GitHub Copilot + Devin + Citi Stylus Workspaces (Claude + Gemini)
180,000 employees on proprietary AI tools; 100,000 developer hours freed per week; coding tasks that took senior developers 1.5 weeks now done in minutes.
Three-layer stack: Citi Squad coding assistant, Citi Stylus Workspaces (Claude + Gemini), and Cognition's Devin for autonomous task execution across 40,000 developers.
Stripe - Cursor
70%+ of engineers now active Cursor users; meaningful gains in development velocity, large-scale migrations, debugging speed, and new hire onboarding.
Bottom-up adoption converted to org-wide standard IDE. Stripe's CTO cited "significant economic outcomes" — notable given R&D is Stripe's single largest spend category.
NAV IT (Norway, 250 developers)
Statistically significant commit frequency increase; qualitative improvements in developer satisfaction and flow state; 26% more tasks completed (aligned with Microsoft/Accenture RCT benchmark)
18-month longitudinal case study (Sep 2023–May 2025) combining surveys, 13 in-depth interviews, and GitHub activity analysis across 250 developers. Used open coding and thematic analysis. Rare long-duration enterprise study showing sustained adoption and productivity gains.
MORE MUST-READ BREAKDOWNS
Enterprise AI for CEOs (16 playbooks)
AI for Board Directors (19 playbooks)
Enterprise AI for CXOs (32 playbooks)
AI for Enterprise Leaders (18 playbooks)
Build-Buy-Borrow (14 frameworks)
Agentic AI Case Studies (40 cases)
AI Use Case Prioritization (12 frameworks)
AI Strategy Playbooks (16 playbooks)
AI Agents & Agentic AI Use Cases (2,195)
ENTERPRISE AI EXECUTIVE
Agentic and generative AI are evolving rapidly in the enterprise, driving a new era of AI transformation.
Twice a week, we review hundreds of the latest agentic and generative AI best practices, case studies, market dynamics and innovations to bring you what is driving material value — and why it’s important.
Example editions:
Found this valuable? Share with a colleague.
Received this from someone else? Sign up here.
Connect on LinkedIn.

