How AWS EC2 Capacity Blocks Help with GPU Shortage

Image shows Piyush kalra with a lime green background

Piyush Kalra

Feb 17, 2025

    Table of contents will appear here.
    Table of contents will appear here.
    Table of contents will appear here.

The global GPU shortage has ML engineers and data scientists at the mercy of resource availability. Even with essential GPUs, the race to AI and ML innovation is becoming increasingly difficult due to stalled initiatives resulting from lack of availability.

This is where AWS EC2 Capacity Blocks take the stage. The predictability of GPU access issues in the AWS world is solved by allowing companies to reserve GPU capacity ahead of time. ML teams can meet deadlines now and complete goals without worrying about resource allocation.

In this blog, we are going to explore the GPU shortage problem in depth, understand how Capacity Blocks function, and explore the reasons behind being a turning point for work ML workloads. We also aim to provide a guide in the end focused on strategies to get the most out of your reservations.

What Is Causing the Global GPU Shortage?

The global GPU shortage in 2025 can singularly be attributed to these factors:

  1. AI and Enterprise Demand: Escalated growth of AI and ML has resulted in surpassing demand for the GPUs. An example being enterprise clients claiming 60% of Nvidia’s production capacity in Q1 2025.

  2. Manufacturing and Disruptions: Semiconductor manufacturing plants are currently at maximum utilization, and an earthquake in January disrupted production at TSMC, the world’s leading chip manufacturer.

  3. Supply Chain Issues: Shipping delays, increasing prices, and a lack of available components continue to restrict GPU accessibility.

  4. Market Impact: Apart from gaming, companies in AI, research, and content creation are experiencing higher than usual prices and restricted availability, which in turn stunts progress.

How the Shortage Impacts Companies

The scarcity of GPUs has a profound impact on different companies:

  • Delayed product launches and higher costs.

  • Reduced innovation, especially for smaller companies.

  • Slower AI development cycles and reduced overall effectiveness.

Why ML Teams Need a New Strategy

  1. Plan and Reserve Resources: Use cloud-based reservations for predictable access, such as AWS EC2 Capacity Blocks.

  2. Optimize Workloads: Implement model compression and mixed-precision training for better GPU utilization.

  3. Diversify Supply Chains: Mitigate single-supplier dependency through multi-cloud approaches and by alternative supply with lower market reliance.

How AWS Offers Relief

GPU elastic scale out is now being offered by cloud service providers like AWS. Their GPU P5 and P4d instances compute bound workloads and provide and high-performance ML training and inference environment purchase cut bereft of physical infrastructure investments.

AWS's hardware-accelerated ecosystem delivers options like:

But beyond scaling GPU access, what AWS offers is more dependable, assuredly on-time, and predictably adjustable solutions tailored to specifics. 

What Are AWS EC2 Capacity Blocks?

These are GPU resource pre-booking arrangements for ML workloads. They ensure that GPUs can be avoided, avoiding order-and-on demand problems.

Key features include:

  • Future-Dated Reservations: Figure of 8-week reservation window for capacity guarantee maxed to 30 days.

  • Guaranteed Availability: Lock in resources for critical workloads and scheduled training activities.

  • Compatible Instance Types: Supported on instances with world-leading GPUs NVIDIA H100, P5, and P4d.

Use Cases for Capacity Blocks

AWS EC2 Capacity Blocks support a variety of mission-critical company functions such as:

  1. Training Large-Scale ML Models: Avoid expensive delays in delivering high-performance models by providing GPU resource guarantees.

  2. Running Scheduled Experiments: Execute short-term plan with predictable resource provision.

  3. Scaling Distributed Workloads: Utilize collocated compute infrastructure for a significant speedup in distributed training to enhance computational tasks.

Why Capacity Blocks Matter for ML Teams

  1. Predictable GPU Access: Timely GPU access with no queuing issues or scrambling for GPU instances during peak-demand periods.

  2. High-Performance Networking: Utilize collocated compute resources to accelerate distributed model training.

  3. Customized Reservation Options: Buying of specific instance types is tailored to workload requirements

Improved Efficiency and Cost Management 

Setting aside resources leads to greater efficiency, cost savings, and more streamlined operations. Reserved resources eliminate downtimes by ensuring availability and avoiding high costs linked with on-demand pricing. Reserved resources also enable teams to plan their capacity more effectively, avoiding outcomes associated with over-provisioning or underutilizing resources. This fosters enhanced control over costs and smoother operational flow.

How Capacity Blocks Solve Real-World Problems 

The Problem of Unpredictable Availability 

Imagine yourself working within a ML team that is responsible for the recommendation engine of a big product launch. Your team has to load the model on the GPU and train it, but during your time-of-use peak demand, none are available. This forces you to scramble for solutions, increases your overall cost, and delays the overall training phase.

How EC2 Capacity Blocks Provide Solutions 

With EC2 Capacity Blocks, you can avoid this scenario entirely. Let’s say your team is aware that the training phase is scheduled to occur over the following two weeks. By using Capacity Blocks, you can pre-purchase NVIDIA H100-based GPU instances for that specific time and date. This ensures that you will have the resources at hand exactly for when you need them, thus providing a smooth training, deadline adherence, and cost predictability.

Without Capacity Blocks, the ability to rely on demand GPU allocation can, and will eventually lead to resource dependency bottlenecks to sustain your workloads during peak resource demand periods. Capacity Blocks help focus on delivering results by removing that unpredictability.

How to Reserve AWS EC2 Capacity Blocks 

  1. Log in to your AWS Console Account

  2. Go to the EC2 Dashboard: Look for Capacity reservations under the EC2 dashboard.


  1. Choose the capacity reservations types: Select Purchase Capacity Blocks for ML.


  1. Specify Your Capacity Requirements

  • Select the instance that best suits your ML/AI workload. (e.g., P5, P5e, P5en, P4d, Trn1, Trn2).

  • Type the number of instances you need (up to 64 per block, with a maximum of 256 across all blocks).


  1. Set your reservation duration:

  • Choose from 1–14 days (in 1-day increments) or 15–182 days (in 7-day increments).

  • Pick your start date (up to 8 weeks in advance) and end date.


  1. Search for Available Capacity Blocks: AWS will display up to 3 matching Capacity Block options based on your criteria, showing details like start time, duration, availability zone, and pricing.


  1. Review and Confirm Your Reservation

  • Double-check the instances, pricing, and reservation details.

  • Finalize your selection and proceed to purchase.


  1. Complete Your Payment

  • The reservation status will show as payment-pending until the upfront payment is completed, which typically takes 5 minutes to 12 hours.

  • Once payment is processed, the reservation status will update to scheduled.

Tips for Cost Optimization

  • Use the AWS Pricing Calculator to estimate costs for your reservation.

  • Avoid over-provisioning by carefully monitoring usage with AWS Cost Explorer.

  • Take advantage of discounts by committing to longer reservation periods.

Pricing and Billing Details of EC2 Capacity Blocks

AWS EC2 Capacity Block Pricing for US East (N. Virginia)

Instance Type

Hourly Rate (USD)

GPU Type

vCPUs

Memory (GiB)

Example Use Case

p5.48xlarge

$98.32

8 x NVIDIA H100

192

2048

LLM training, advanced ML

p4d.24xlarge

$11.80

8 x NVIDIA A100

96

1152

ML model training, inference

trn1.32xlarge

$9.532

16 x AWS Trainium

128

512

Trainium ML workloads

Billing Details

  • Upfront Payment: Full cost charged immediately after reservation. 

  • No Hidden Fees: Only pay for services rendered and operational system resource (OS) utilization; no penalties for time not used within the reserved period. 

  • Transparent Pricing: Prices set at the time of purchase, based on demand and supply, do not change during the duration of the contract.

  • OS Charges: Billed separately per-second or per-hour while running. 

  • Flexible Reservations: Reserve capacity duration can be as short as a day or extend to six months, depending on project goals and timeline.

Key Points

  • No Additional Discounts: Capacity Blocks do not benefit from the Savings Plans or Reserved Instance discounts. The price you see is the price you pay. 

  • Examples of Pricing: In US East (N. Virginia), hourly rates for GPUs range from $9.532 (trn1.32xlarge) to $98.32 (p5.48xlarge), with upfront billing for the reserved period.

Conclusion

To maintain the edge in the rat race of AI and ML technologies, companies have to focus on driving efficiency with allocated GPU resources. Adopting a proactive reservation approach enables companies to ensure seamless processes for development and deployment. Taking actions to reserve the capacity, prep teaching teams, and partnering with AWS specialists will greatly accelerate procedures and serve as an indispensable asset in the innovation-performance quotient. Scheduling GPUs is no longer an operations decision, but a competitive strategy.

Join Pump for Free

If you are an early-stage startup that wants to save on cloud costs, use this opportunity. If you are a start-up business owner who wants to cut down the cost of using the cloud, then this is your chance. Pump helps you save up to 60% in cloud costs, and the best thing about it is that it is absolutely free!

Pump provides personalized solutions that allow you to effectively manage and optimize your Azure, GCP and AWS spending. Take complete control over your cloud expenses and ensure that you get the most from what you have invested. Who would pay more when we can save better?

Are you ready to take control of your cloud expenses?

Similar Blog Posts

1390 Market Street, San Francisco, CA 94102

Made with

in San Francisco, CA

© All rights reserved. Pump Billing, Inc.

1390 Market Street, San Francisco, CA 94102

Made with

in San Francisco, CA

© All rights reserved. Pump Billing, Inc.

1390 Market Street, San Francisco, CA 94102

Made with

in San Francisco, CA

© All rights reserved. Pump Billing, Inc.