cloud-infrastructurefinopsai-strategygpu-economics

The Cheap-GPU Era Is Over: AWS's 15% Hike and the New Capacity Math

AWS raised EC2 GPU capacity prices roughly 15% in early 2026, breaking a two-decade deflationary trend. Here's what shifted, who it hits hardest, and the cost traps the new pricing introduces.

The Cheap-GPU Era Is Over: AWS's 15% Hike and the New Capacity Math

Cloud compute has gotten cheaper every year for two decades. As of early 2026, that streak is over for the part of the cloud that matters most for AI: GPU capacity.

AWS raised EC2 Capacity Blocks for ML by roughly 15% in the first quarter of 2026, citing tighter supply and surging enterprise demand. H100 8-GPU instances now run around $55 to $60 per hour on AWS, $80 to $90 on Google Cloud, and roughly $98 per hour on Azure in U.S. regions. Neocloud providers are still cheaper — sometimes 30 to 50 percent cheaper — but increasingly only via multi-year take-or-pay contracts.

The pricing line moved. The capacity model under it moved too. If your AI roadmap was sized against 2024 cloud economics, the plan is wrong now.

What actually changed

Three things, and they reinforce each other.

One. GPU compute is no longer subject to the deflationary curve that shaped cloud cost planning for two decades. Supply is tight, demand from foundation-model training and enterprise inference is structurally high, and the hyperscalers have realized they can charge for scarcity. The 15% AWS hike is the visible signal. The pattern is what to plan around.

Two. Pay-as-you-go is being quietly replaced by take-or-pay. The cheapest GPU capacity in the market — neocloud providers offering H100 and H200 clusters at meaningful discounts to AWS — generally requires multi-year commitments with minimum spend floors. The optionality of on-demand cloud is the part that’s being repriced. Companies that bet on flexibility now pay for it.

Three. Inference, not training, is now where the money goes. Industry estimates put 55 to 80 percent of enterprise GPU spending against inference workloads. Training is a one-time bill. Inference is rent.

Who this hits hardest

Three profiles take the brunt:

The pilot-to-production company. You built a proof of concept on on-demand GPU. You’re about to scale to real traffic. Your unit economics were calculated against last year’s pricing and you haven’t re-run them. The 15% hike alone may have moved your gross margin by a few points; the inference share of cost compounds that.

The multi-region enterprise. You priced your AI strategy assuming you could move workloads between AWS, GCP, and Azure based on availability. With each cloud at a different price point and tight capacity at all three, that flexibility is now expensive. Per-region inference cost can vary by 50 percent or more.

The compliance-bound buyer. You can’t put workloads on neoclouds whose data residency or compliance story you can’t validate. So you pay the hyperscaler premium and have less leverage in negotiations than the team that can credibly threaten to leave.

The cost trap nobody is talking about

Take-or-pay contracts at neocloud providers look cheap on the spreadsheet. They are cheap if you accurately forecast your inference demand over a two-year window. Most companies cannot.

We’ve seen forecast errors of 40 to 60 percent in client AI workloads even when the underlying product roadmap was relatively stable. Token-based pricing, model upgrades that change cost-per-call, and seasonality on the product side all interact. Lock in too much capacity and you pay for idle GPUs. Lock in too little and you pay on-demand rates for the overflow — which, given the new pricing, may be more expensive than the contract would have been.

The right answer is rarely “buy the biggest commitment.” It is almost always “build the forecasting and observability you need to right-size, then commit only against the floor of your demand.”

What to do this quarter

  1. Re-run unit economics with 2026 prices. Both on-demand and committed. If you haven’t refreshed your AI cost model since Q4 2024, it’s wrong.

  2. Separate training from inference in your cost model. Track them on different schedules and against different commitment strategies. Treating them as one number hides the leverage points.

  3. Stress-test your take-or-pay scenarios. Model the inference forecast at -40%, baseline, and +40%. If a downside scenario kills your margin, you’ve committed too much.

  4. Build a multi-cloud fallback even if you don’t use it. The credible threat to migrate is what gives you negotiating leverage when contracts renew. Without it, the hyperscaler price is the price.

The honest read

The cheap-GPU era ending is not a temporary supply blip. It is a structural shift in how cloud capacity for AI gets priced and committed, and it changes the shape of the FinOps problem. The right response is not to panic-buy commitments or chase the cheapest neocloud — it’s to build the cost observability and demand forecasting that let you make the commitment decision with data.

If you’re planning a cloud migration, reviewing your IaC setup, or trying to get GPU and inference FinOps under control, request a consultation.

Free Resources

Evaluating your AI or cloud readiness?

Download our free assessment tools — built for technology leaders in regulated industries.

AI Readiness Assessment Cloud Maturity Assessment
← Back to blog