Skip to content
Kinja.
AI & Machine Learning·Feature0192

How Much VRAM You Need for Stable Diffusion in 2026 Depends on Which One

Flux.2 Dev needs 32GB of VRAM at FP8. Stable Diffusion 1.5 needs 4GB. The honest answer lives between those two numbers, and it does not point where Nvidia wants it to point.

9 min read
Share
Close-up of an Nvidia GeForce RTX graphics card installed in a computerPhoto · Kinja

On November 25, 2025, Black Forest Labs released Flux.2. The full model has 32 billion parameters and needs 90GB of VRAM to run at native precision. That release also made every existing article on how much VRAM for Stable Diffusion obsolete, because the question assumes there is one Stable Diffusion. In 2026, "Stable Diffusion" is shorthand for roughly fifteen different models spanning a 22x VRAM range, from SD 1.5 on a GTX 1060 up through Flux.2 Dev on an H200.

Key Takeaway

  • 12GB VRAM is the honest floor for modern Stable Diffusion work in 2026. 24GB is the correct target for anything released in the last eighteen months.
  • A used RTX 3090 at $700 to $1,050 is the right card for almost everyone. It beats a new $1,200 RTX 5080 outright for AI, because 16GB is not enough.
  • Flux.2 Dev at FP8 needs 32GB of VRAM. Only the RTX 5090 can run it natively.
  • Software choice changes the answer by 30 to 40 percent. ComfyUI is leaner than Automatic1111 on the same workflow.
  • Quantization (Q8 GGUF or FP8) is the other half of the answer. Most people cannot tell Q8 from full precision in a blind test.

The short answer: 12GB is the honest floor, 24GB is the correct target, and a used RTX 3090 at $700 to $1,050 is the right card for almost everyone who is not getting paid to train models. The long answer, explaining why a $1,200 RTX 5080 is the worst AI purchase in Nvidia's current lineup, takes the rest of this article.

The four VRAM tiers that actually matter in 2026

Current diffusion models fall into four rough tiers, and the one you care about determines your answer.

Frontier tier. Flux.2 Dev (32B parameters) was built for H100-class datacenter GPUs. FP8 quantization cuts VRAM usage to roughly 32GB, putting it within reach of the RTX 5090 but nothing below. Q4 GGUF compresses it further to 19GB, which runs on a 24GB RTX 4090 with the text encoder offloaded to the CPU.

Mainstream tier. Flux.1 Dev (12B parameters) and Stable Diffusion 3.5 Large (8.1B parameters). Flux.1 Dev needs 24GB at FP16, 12 to 15GB at FP8 or Q8 GGUF, and 6 to 8GB at Q4. SD 3.5 Large officially required 18GB, but Nvidia's TensorRT plus FP8 quantization cut that to 11GB and made it 2.3x faster as a bonus. Most serious local users target this tier.

Consumer-friendly tier. Stable Diffusion 3.5 Medium at 9.9GB, and Flux.2 Klein 4B at roughly 13GB. Built for midrange cards the frontier tier left behind.

Legacy tier. SDXL (4 to 8GB depending on software) and Stable Diffusion 1.5 (4 to 6GB). These still have the largest LoRA ecosystems by far. If your only requirement is "generate a 1024x1024 image from text," an 8GB card works fine.

Practical takeaway: if you only care about SDXL and SD 1.5, 8GB is enough. If you want anything from the last eighteen months of releases, 12GB is your floor and 24GB is your comfort zone. If you are still deciding which model to run at all, our breakdown of the best AI image generators in 2026 walks through where Midjourney, GPT Image 1.5, Flux, and Ideogram each make sense.

Your software choice changes the answer by 40 percent

This part gets left out of nearly every guide that currently ranks for this keyword.

ComfyUI uses roughly 30 to 40 percent less VRAM than Automatic1111 for the same SDXL workflow. That is the difference between an 8GB card crashing constantly in A1111 and running SDXL cleanly in ComfyUI.

The common advice (8GB minimum for SDXL, 12GB recommended, 16GB safe) was written against A1111's memory footprint and never updated. It was wrong on ComfyUI from day one.

Forge is the maintained A1111 fork that closed most of that gap. Running vanilla A1111 in 2026 is like running Internet Explorer: it works, it eats memory for no reason, and a better free alternative exists with the same interface. Use Forge if you want the A1111 interface, ComfyUI if you want control.

One other software tax: ControlNet adds 1 to 3GB per loaded model, and stacking three can push a 16GB card into offloading. LoRAs add about 0.2GB each and rarely matter.

Quantization is the other half of the answer

Quantization compresses model weights. GGUF, originally from llama.cpp, got adapted for diffusion models by a developer called city96 and is now the default way to run Flux on consumer hardware.

The ladder runs Q2 (fastest, visible quality loss), Q4 (practical floor), Q6 (very close to full quality), Q8 (near-lossless), and full precision at FP16 or BF16. Below Q4, degradation shows up in hands, faces, and fine text. At Q8, almost no one can tell the difference.

Hardware matters too. FP8 runs natively on RTX 40 series, RTX 50 series, and H100/A100. FP4 runs natively only on Blackwell, which means the RTX 5090 gets roughly 2x inference speed over the 4090 on the same Flux workload. Nvidia's TensorRT stacks on top of FP8 to compound the speedup.

For Flux.1 Dev: FP16 is 24GB, FP8 or Q8 GGUF is 12GB, Q4 GGUF is 6 to 8GB. Q8 is the right call for serious work, Q4 is the floor. Running FP16 locally is mostly vanity unless you have 24GB+ and have actually compared side by side. Most people cannot tell the difference.

The used RTX 3090 beats the RTX 5080 for AI, and it is not close

This is the recommendation gaming-card review sites cannot quite bring themselves to make, because the RTX 5080 is a legitimately good gaming card. For AI, it is a trap.

The 5080 ships with 16GB of GDDR7 at around $1,200. The five-year-old RTX 3090 ships with 24GB of GDDR6X and sells used for $700 to $1,050.

What the 5080's 16GB cannot do: run Flux.1 Dev at FP16 (24GB), run Stable Diffusion 3.5 Large unquantized (18GB), run Flux.2 Dev at FP8 (32GB, permanently out of reach), or train Flux LoRAs comfortably (24GB).

What the 3090's 24GB does: runs Flux.1 Dev FP16 with headroom, runs SD 3.5 Large unquantized, supports SDXL LoRA training with fused backward pass, and supports NVLink (a feature Nvidia dropped from RTX 40 and 50 series).

The 5080 is meaningfully faster than a 3090 on workloads that fit in 16GB, thanks to 5th-gen Tensor Cores with FP8 and FP4 support. That is a real advantage, and it does not compensate for being locked out of half the models you would buy the card to run.

The RTX 5090 is the only consumer card with enough VRAM to run Flux.2 Dev at FP8 without text-encoder offloading. At $3,500 street (MSRP $2,000, scalpers still control supply), it is justified only if Flux.2 Dev work pays for the card. For everyone else: a used 3090 for most people, a used 4090 if you want 40 to 70 percent more speed, a 5090 for Flux.2 Dev professionals. Skip the 5080 entirely if AI is your primary use case. If you are building the rest of the machine around this GPU, our 2026 PC build guide covers the CPU, memory, and PSU pairings that do not bottleneck an AI-first workload.

LoRA training is a different math than inference

LoRA training teaches a model to generate a specific character, style, or concept. It takes roughly twice the VRAM of inference for the same model, because training requires gradients, optimizer states, and the full forward and backward pass held in memory simultaneously.

SDXL LoRA training used to need 24GB. A Kohya-ss feature called fused backward pass dropped the comfortable requirement to 12GB, putting RTX 3060 12GB users in reach. Flux.1 LoRA training needs 24GB with Q8 quantization. Flux.2 LoRA on 8GB with Q4 GGUF is "possible" in the sense that it takes three to four hours per attempt. For anyone actually iterating, the honest floor is 24GB.

Full fine-tuning of Flux-scale models requires datacenter hardware (80GB+) and is not a consumer question. If you plan to train, a used 3090 is the cheapest route to getting work done.

Honest hardware picks by budget

Under $500 used. RTX 3060 12GB, roughly $200 to $250. Handles SDXL and SD 3.5 Medium cleanly. Flux.1 Dev requires Q4 GGUF and patience.

$700 to $1,050 used. RTX 3090 24GB. The unambiguous mainstream pick. Every mainstream-tier model runs comfortably. Only consumer card with NVLink if you want to pair two.

$700 used, $1,100+ new. RTX 4070 Ti Super 16GB. MSRP was $799; new stock climbed above $1,100 in the 2026 GPU crunch, making used the only sensible entry point. 16GB limits you to GGUF for Flux.

$1,100 to $1,800 used. RTX 4090 24GB. Production ceased October 2024; used prices stay firm because AI buyers keep snapping them up. Same 24GB as a 3090, 40 to 70 percent faster on Stable Diffusion. Worth it over a used 3090 if raw throughput matters.

$3,500 new. RTX 5090 32GB. Only consumer card for Flux.2 Dev FP8. Justify it with workload, not enthusiasm.

Cloud instead. RTX 5090 starts at $0.27/hr on SaladCloud, $0.76/hr on Spheron. H100 PCIe is $2.01/hr. Break-even against an $800 used 3090 is roughly 1,000 to 3,000 hours depending on provider. If you work fewer than a few hours a day, rent.

Skip entirely for AI: any 8GB card, the RTX 5080, and Apple Silicon as a primary AI machine. A Mac Studio with 128GB of unified memory runs Flux two to four times slower than a 24GB CUDA card.

The honest answer

The question answers itself once you rephrase it. Not "how much VRAM for Stable Diffusion," but which model, which software, which quantization. The floor is 12GB. The correct answer is 24GB. The cheapest correct answer is a used RTX 3090. Anything else is either a compromise you should understand before paying for it, or Nvidia's marketing working on you. For more on the models that justify buying the card in the first place, our AI coverage tracks what each tool is actually good at in 2026.

Frequently Asked Questions

What is the minimum VRAM to run Stable Diffusion in 2026?

4GB will technically run Stable Diffusion 1.5, and 8GB will run SDXL in ComfyUI. But for any model released in the last eighteen months, including Flux.1 Dev, SD 3.5 Large, or Flux.2 Klein, 12GB is the honest floor. Below 12GB, you are locked into legacy models and workarounds.

Is 16GB of VRAM enough for Stable Diffusion?

For SDXL, SD 3.5 Medium, and Flux.1 Dev via GGUF quantization, yes. For Flux.1 Dev at FP16, SD 3.5 Large unquantized, or Flux.2 Dev at any precision, no. This is why a $1,200 RTX 5080 with 16GB is a worse AI buy than a $900 used RTX 3090 with 24GB, despite the 5080 being a newer and faster gaming card.

Is a used RTX 3090 better than a new RTX 4070 Ti Super for AI?

Yes, clearly. The 3090 has 24GB of VRAM compared to the 4070 Ti Super's 16GB, which is the single most important spec for AI inference and training. The 4070 Ti Super is roughly 30 percent faster on workloads that fit in 16GB, but the 3090 can run 50 percent more of the current model landscape. VRAM capacity outranks raw speed for AI use.

Can I train a LoRA on 12GB of VRAM?

SDXL LoRA training works cleanly on 12GB with Kohya-ss and fused backward pass enabled. Flux.1 LoRA training requires 24GB minimum for reasonable iteration speed. Flux.2 LoRA on 12GB is technically possible with Q4 GGUF but takes several hours per attempt, which is not practical for real work.

Do I need an Nvidia GPU, or can I use AMD for Stable Diffusion?

AMD cards can run Stable Diffusion through ROCm on Linux and DirectML on Windows, but the ecosystem is significantly behind Nvidia's CUDA stack. Expect longer setup, missing optimizations, slower inference, and broken compatibility with newer features like FP8 quantization and TensorRT. For anyone who is not actively choosing AMD on principle, an Nvidia card is the right answer.

What is GGUF quantization and why does it matter for Flux?

GGUF is a compressed model format originally built for llama.cpp and adapted for diffusion models by city96. It cuts VRAM usage dramatically with minimal quality loss at Q8, and meaningful but acceptable loss at Q4. GGUF is the reason Flux.1 Dev runs on 12GB cards at all. Q8 is the sweet spot for serious work; Q4 is the practical floor before hands and text start degrading.

§Topics
Alex Chen
§Written by
Alex Chen

Technology journalist who has spent over a decade covering AI, cybersecurity, and software development. Former contributor to major tech publications. Writes about the tools, systems, and policies shaping the technology landscape, from machine learning breakthroughs to defense applications of emerging tech.

§Continue reading

Continue in AI & Machine Learning.

§ 06The Kinja Brief · Free

Nine stories, one editor, six a.m.

One email, Monday through Friday. Written by a human editor on the day it is sent, signed at the bottom, never auto-generated. Unsubscribe in one click.

No tracking pixels. No data resale. See our privacy policy.

Share