Google Flex-start VMs GA: Up to 53% GPU Discounts for AI Workloads

The GPU Budget Problem Every AI Team Faces

Ever wonder why GPU bills eat your cloud budget alive? You're running batch inference jobs, fine-tuning models, or processing HPC workloads—and every hour of A100 or H100 time burns through your budget like a kitchen torch through creme brulee.

Google Cloud just served up a solution: Flex-start VMs are now Generally Available, offering up to 53% discounts on GPUs and TPUs. The catch? Your workload needs a little patience.

What Just Happened: Flex-start VMs Hit GA

On September 25, 2025, Google announced General Availability of Flex-start VMs through the Compute Engine Instance API. This isn't a preview anymore—it's production-ready.

Available via Compute Engine API, gcloud CLI, and Console

Discounted pricing for A2, A3, A4, G4, and H4D machine types

Uses Dynamic Workload Scheduler (DWS) for fair queuing

Stop/start support for workload checkpointing

The Numbers That Matter

53%

Maximum Discount

Save up to 53% on vCPUs, GPUs, and TPUs with Flex-start pricing

2-Hour Queue Window

Your request waits up to 2 hours for GPU availability (configurable from 90 seconds)

Maximum Runtime

Once provisioned, run continuously for up to 7 days

How Flex-start VMs Work

Flex-start VMs use Google's Dynamic Workload Scheduler (DWS) to provision compute resources from a secure capacity pool. Instead of the traditional"fail immediately if no capacity" model, your request joins a managed queue.

Submit Request

Create a Flex-start VM with your desired machine type (A2, A3, A4, G4, or H4D)

Queue Processing

VM enters PENDING state while DWS finds capacity (up to 2 hours)

Provisioning

When capacity is available, your VM starts automatically at discounted rates

Execution

Run your workload for up to 7 days continuously

Supported Machine Types with Discounts

Not all accelerator-optimized machine types receive the Flex-start discount. Here's what qualifies:

Machine Type	GPU/Accelerator	Discount Eligible
A4	NVIDIA GB200 NVL72
A3 (Mega/High/Ultra)	NVIDIA H100 / H200
A2 (Standard/Ultra)	NVIDIA A100
G4	NVIDIA L4
H4D	CPU-optimized (no GPU)
A4X	Not Supported

Note: Other accelerator-optimized machine types can use Flex-start provisioning but don't receive the discount.

Perfect Use Cases for Flex-start VMs

Flex-start VMs shine when your workload can tolerate a startup delay. Here's where they deliver maximum value:

Batch Inference

Process large datasets overnight or during off-peak hours. Queue your job before leaving and wake up to results at half the cost.

Model Fine-Tuning

Fine-tuning runs don't need to start immediately. Submit your job, let it queue, and save significantly on GPU hours.

HPC Simulations

Scientific computing, molecular dynamics, and CFD simulations that run for hours or days are perfect candidates.

Research Experiments

ML research, hyperparameter sweeps, and ablation studies where latency doesn't matter but cost does.

Not Recommended For:

• Real-time inference serving production traffic
• Interactive development requiring immediate GPU access
• SLA-bound workloads with strict availability requirements
• Workloads requiring placement policies or reservations

Pro Feature: Stop/Start for Workload Checkpointing

One of the most powerful features of Flex-start VMs is the ability to stop and restart your instances. This enables sophisticated cost management:

Pause Billing

Stop your VM to halt billing while preserving your boot disk and configuration.

Reset the 7-Day Clock

After a successful restart, your 7-day runtime limit resets—effectively unlimited runtime with checkpoints.

Preserve Configuration

IP addresses and boot disks persist across stop/start cycles—no re-setup required.

Tip: Set instanceTerminationAction = STOP to automatically stop (instead of delete) when hitting the 7-day limit.

Quick Start: Create a Flex-start VM

Using gcloud CLI:

gcloud compute instances create my-flex-start-vm \
 --machine-type=a3-megagpu-8g \
 --provisioning-model=FLEX_START \
 --max-run-duration=3d \
 --request-valid-for-duration=2h \
 --zone=us-central1-a

Key Parameters:

--provisioning-model=FLEX_START — Enables Flex-start mode
--max-run-duration — How long the VM runs (up to 7d)
--request-valid-for-duration — Queue timeout (90s to 2h)

Know the Limitations

No Placement Policies

You cannot apply placement policies to Flex-start VMs. If you need specific hardware placement, use standard provisioning.

No Reservations

Flex-start VMs cannot consume reserved capacity. They're designed for flexible, non-reserved workloads.

Host Maintenance = Stop

During host maintenance events, Flex-start VMs must stop. Plan your checkpointing accordingly.

Preemptible Quota Required

Flex-start VMs consume preemptible quota. Ensure you have sufficient quota for your desired resources.

Chef's Rule #47

"The patient chef gets the best ingredients at the lowest price."

Flex-start VMs reward patience with massive savings. If your workload can wait 2 hours to start, you shouldn't be paying full on-demand prices.

Real Customer Impact

Hudson River Trading (HRT)

Uses Flex-start's direct API access for custom scheduling in their quantitative trading workloads—integrating discounted GPU access into sophisticated automated pipelines.

Oz Forensics

Trains anti-fraud models with reliable A100 GPU access. The queuing mechanism eliminated their manual retry loops and reduced GPU costs significantly.

The Bottom Line

Flex-start VMs are Google Cloud's answer to the GPU cost crisis. Up to 53% savings, 7-day runtime, and intelligent queuing make them ideal for batch inference, fine-tuning, HPC, and research workloads.

This is how smart cloud kitchens run AI workloads profitably.

Sources:

#gcp#finops#gpu#ai-workloads#cost-optimization#flex-start#dynamic-workload-scheduler#machine-learning