Google Flex-start VMs GA: Up to 53% GPU Discounts for AI Workloads
CloudCostChefs TeamThe GPU Budget Problem Every AI Team Faces
Ever wonder why GPU bills eat your cloud budget alive? You're running batch inference jobs, fine-tuning models, or processing HPC workloads—and every hour of A100 or H100 time burns through your budget like a kitchen torch through creme brulee.
Google Cloud just served up a solution: Flex-start VMs are now Generally Available, offering up to 53% discounts on GPUs and TPUs. The catch? Your workload needs a little patience.
What Just Happened: Flex-start VMs Hit GA
On September 25, 2025, Google announced General Availability of Flex-start VMs through the Compute Engine Instance API. This isn't a preview anymore—it's production-ready.
The Numbers That Matter
Maximum Discount
Save up to 53% on vCPUs, GPUs, and TPUs with Flex-start pricing
2-Hour Queue Window
Your request waits up to 2 hours for GPU availability (configurable from 90 seconds)
Maximum Runtime
Once provisioned, run continuously for up to 7 days
How Flex-start VMs Work
Flex-start VMs use Google's Dynamic Workload Scheduler (DWS) to provision compute resources from a secure capacity pool. Instead of the traditional "fail immediately if no capacity" model, your request joins a managed queue.
Submit Request
Create a Flex-start VM with your desired machine type (A2, A3, A4, G4, or H4D)
Queue Processing
VM enters PENDING state while DWS finds capacity (up to 2 hours)
Provisioning
When capacity is available, your VM starts automatically at discounted rates
Execution
Run your workload for up to 7 days continuously
Supported Machine Types with Discounts
Not all accelerator-optimized machine types receive the Flex-start discount. Here's what qualifies:
| Machine Type | GPU/Accelerator | Discount Eligible |
|---|---|---|
| A4 | NVIDIA GB200 NVL72 | |
| A3 (Mega/High/Ultra) | NVIDIA H100 / H200 | |
| A2 (Standard/Ultra) | NVIDIA A100 | |
| G4 | NVIDIA L4 | |
| H4D | CPU-optimized (no GPU) | |
| A4X | Not Supported |
Note: Other accelerator-optimized machine types can use Flex-start provisioning but don't receive the discount.
Perfect Use Cases for Flex-start VMs
Flex-start VMs shine when your workload can tolerate a startup delay. Here's where they deliver maximum value:
Batch Inference
Process large datasets overnight or during off-peak hours. Queue your job before leaving and wake up to results at half the cost.
Model Fine-Tuning
Fine-tuning runs don't need to start immediately. Submit your job, let it queue, and save significantly on GPU hours.
HPC Simulations
Scientific computing, molecular dynamics, and CFD simulations that run for hours or days are perfect candidates.
Research Experiments
ML research, hyperparameter sweeps, and ablation studies where latency doesn't matter but cost does.
Not Recommended For:
- • Real-time inference serving production traffic
- • Interactive development requiring immediate GPU access
- • SLA-bound workloads with strict availability requirements
- • Workloads requiring placement policies or reservations
Pro Feature: Stop/Start for Workload Checkpointing
One of the most powerful features of Flex-start VMs is the ability to stop and restart your instances. This enables sophisticated cost management:
Pause Billing
Stop your VM to halt billing while preserving your boot disk and configuration.
Reset the 7-Day Clock
After a successful restart, your 7-day runtime limit resets—effectively unlimited runtime with checkpoints.
Preserve Configuration
IP addresses and boot disks persist across stop/start cycles—no re-setup required.
Tip: Set instanceTerminationAction = STOP to automatically stop (instead of delete) when hitting the 7-day limit.
Quick Start: Create a Flex-start VM
Using gcloud CLI:
gcloud compute instances create my-flex-start-vm \ --machine-type=a3-megagpu-8g \ --provisioning-model=FLEX_START \ --max-run-duration=3d \ --request-valid-for-duration=2h \ --zone=us-central1-a
Key Parameters:
--provisioning-model=FLEX_START— Enables Flex-start mode--max-run-duration— How long the VM runs (up to 7d)--request-valid-for-duration— Queue timeout (90s to 2h)
Know the Limitations
No Placement Policies
You cannot apply placement policies to Flex-start VMs. If you need specific hardware placement, use standard provisioning.
No Reservations
Flex-start VMs cannot consume reserved capacity. They're designed for flexible, non-reserved workloads.
Host Maintenance = Stop
During host maintenance events, Flex-start VMs must stop. Plan your checkpointing accordingly.
Preemptible Quota Required
Flex-start VMs consume preemptible quota. Ensure you have sufficient quota for your desired resources.
Chef's Rule #47
"The patient chef gets the best ingredients at the lowest price."
Flex-start VMs reward patience with massive savings. If your workload can wait 2 hours to start, you shouldn't be paying full on-demand prices.
Real Customer Impact
Hudson River Trading (HRT)
Uses Flex-start's direct API access for custom scheduling in their quantitative trading workloads—integrating discounted GPU access into sophisticated automated pipelines.
Oz Forensics
Trains anti-fraud models with reliable A100 GPU access. The queuing mechanism eliminated their manual retry loops and reduced GPU costs significantly.
The Bottom Line
Flex-start VMs are Google Cloud's answer to the GPU cost crisis. Up to 53% savings, 7-day runtime, and intelligent queuing make them ideal for batch inference, fine-tuning, HPC, and research workloads.
This is how smart cloud kitchens run AI workloads profitably.