How to Train a Custom AI Image Model: LoRA, Dreambooth, and Fine-Tuning

May 3, 2026 · Editorial Team · 8 min read · tutorial image-generation fine-tuning

The difference between a generic AI image generator and a custom-trained one is the difference between "a man in a suit" and "photos of Alex, the actual person." Generic models know what suits look like. They don't know Alex. Training your own model, or more precisely, training a LoRA adapter on top of an existing model, is how you bridge that gap.

This guide covers the practical path from zero to a working custom model. It focuses on two main approaches: LoRA fine-tuning (which is fast, lightweight, and the method most people should start with) and full Dreambooth fine-tuning (which is slower and heavier but can go further). It covers Flux and SDXL as the base models, and four practical training environments: kohya_ss locally, Replicate, Fal.ai, and Civitai's trainer.

What you're actually doing when you train a custom model

Training an AI image model from scratch requires millions of images and weeks of compute time on dozens of GPUs. That's not what most people mean when they talk about custom model training. What most people do is fine-tuning: taking an already-trained model and adjusting it to recognize or reproduce something new.

LoRA (Low-Rank Adaptation) is the most practical fine-tuning method available today. Instead of retraining the entire model, a LoRA trains a small adapter, a set of weight adjustments that layer on top of the base model. The resulting file is small (anywhere from 50MB to 500MB depending on rank and base model), trains in under an hour on consumer hardware, and can be applied to any compatible base model checkpoint.

The tradeoff is that LoRAs work best for style and subject consistency. They're excellent for learning a specific person's appearance, a particular artistic style, or a product with distinctive visual features. They're not suited for teaching the model entirely new concepts that require deep structural changes.

Dreambooth was Google's original method for fine-tuning diffusion models on custom subjects. It updates the actual model weights rather than using an adapter. This makes it more powerful but also heavier: Dreambooth checkpoints are large (2-7GB), training takes longer, and the process requires more GPU VRAM. Modern Dreambooth implementations often use LoRA internally (Dreambooth LoRA), which gives you most of the power at much lower resource cost. When people say "Dreambooth" today, they often mean this hybrid approach.

Choosing your base model: Flux vs SDXL

The two main base models worth fine-tuning in 2026 are Flux (specifically Flux.1-dev) and SDXL (Stable Diffusion XL).

Flux.1-dev is the better base model for most new training work. It handles prompts more literally, produces better photorealistic output, and its LoRA ecosystem has matured significantly. The 12B parameter model requires more VRAM than SDXL (24GB is comfortable, 16GB is workable with optimizations), but the output quality justifies it for any serious project. Flux LoRAs trained on 15-20 images of a subject can produce consistent, high-quality results.

SDXL is more forgiving on hardware (a 12-16GB GPU handles it), has a larger existing ecosystem of pre-trained LoRAs on Civitai, and is faster to train. If you're running on older hardware or need to iterate quickly, SDXL remains a reasonable choice. The output quality ceiling is lower than Flux, but for many use cases that ceiling is still more than sufficient.

If you're starting fresh and have access to a 24GB GPU (or are using a cloud service), train on Flux. If you're hardware-constrained, SDXL is the pragmatic choice.

Preparing your training dataset

Dataset quality matters more than almost anything else in fine-tuning. Ten well-chosen, well-captioned images will outperform 100 inconsistent ones.

For a subject/person LoRA:

15-25 images is a typical range. More isn't always better, the model can overfit to specific poses or backgrounds.
Vary lighting, backgrounds, angles, and expressions. If all your training images have the same indoor lighting against a white wall, the model will associate those conditions with your subject.
Avoid images where the subject is partially occluded, low-resolution, or blurry.
Mixed full-body and head shots work better than exclusively one type.

For a style LoRA:

50-100 images from the target style is a reasonable dataset for most artistic styles.
Consistency matters more than quantity. If you're training on a specific artist's work, try to capture the range of their style rather than just their most famous pieces.
Remove outliers, images that are stylistically different from what you want the model to learn.

Captioning your images is the step most beginners skip. Each image needs a text caption that describes what's in it. For a subject LoRA, these captions use a trigger word (a unique identifier like "ohwx person" that you'll use in prompts later) combined with descriptive text: "ohwx person sitting at a desk in an office, blue shirt, looking at camera."

Tools that auto-caption: WD14 tagger (good for anime/illustration style), BLIP-2/LLaVA (good for photorealistic subjects), or you can caption manually for small datasets.

Method 1: Training locally with kohya_ss

kohya_ss is the most widely used local training script for Stable Diffusion and Flux LoRAs. It's a Python-based GUI that wraps the underlying training code in a usable interface.

Setup requirements:

Python 3.10 or 3.11
CUDA-capable GPU with at least 12GB VRAM (24GB for Flux)
Git

Installation:

git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
python kohya_gui.py

Key settings for a Flux LoRA (flux1-dev base):

Pretrained model: path to your flux1-dev weights (downloadable from Hugging Face with an account)
Training type: FLUX.1 LoRA
Network rank (Dim): 16 for a focused subject LoRA, 32-64 for a more complex style LoRA. Higher rank = larger file + longer training + more capacity.
Network alpha: set to half your network rank (8 for rank 16, 16 for rank 32)
Learning rate: 1e-4 for LoRA layers, 0 for unet/text encoder in basic setups
Steps: 1000-2000 steps for a 15-20 image dataset. Save checkpoints every 200-500 steps.
Batch size: 1 if you're on 16GB VRAM, 2-4 if you have 24GB+
Resolution: 1024x1024 for Flux

Saving and testing: kohya_ss saves .safetensors files. Load them in ComfyUI or Automatic1111 using the LoRA loader node/tab, and test with prompts that include your trigger word.

Method 2: Replicate for cloud training

Replicate (not to be confused with Fal.ai) offers a managed training API that removes the local setup entirely. If you don't want to manage GPU drivers and Python environments, this is the fastest path to a working LoRA.

Replicate has a dedicated Flux fine-tuning trainer. The workflow:

Upload your training images as a zip file (or provide URLs)
Configure training parameters through their web UI or API
Replicate runs the training on their infrastructure
The resulting LoRA is stored on your Replicate account and immediately runnable through their API

Cost: Replicate charges per second of GPU time. A typical Flux LoRA training run (20 images, 1000 steps) costs roughly $2-5 depending on GPU tier. For experimentation, this is more economical than maintaining a local 24GB GPU.

API-based workflow (useful for automation):

import replicate

training = replicate.trainings.create(
    version="ostris/flux-dev-lora-trainer:...",
    input={
        "input_images": open("training_images.zip", "rb"),
        "steps": 1000,
        "lora_rank": 16,
        "trigger_word": "ohwx person",
    },
    destination="yourusername/your-custom-model",
)

The output model is then callable as a standard Replicate prediction.

Method 3: Fal.ai for fast cloud training

Fal.ai has become a strong option specifically for Flux LoRA training. Their Flux trainer is fast (many runs complete in under 10 minutes), the interface is simple, and they expose both a web UI and a clean Python/JavaScript SDK.

The Fal.ai workflow is similar to Replicate but with a focus on speed over configurability. For quick subject LoRAs where you don't need to tune every hyperparameter, Fal.ai is often the fastest path:

import fal_client

result = fal_client.subscribe(
    "fal-ai/flux-lora-fast-training",
    arguments={
        "images_data_url": "https://your-bucket.com/images.zip",
        "trigger_word": "ohwx person",
        "steps": 1000,
    },
)

The returned LoRA URL is immediately usable with Fal.ai's Flux inference endpoints. Fal.ai also supports combining multiple LoRAs at inference time, which is useful for combining a style LoRA with a subject LoRA.

Method 4: Civitai's browser-based trainer

Civitai is primarily known as a model-sharing community, but they added a browser-based LoRA trainer that handles SDXL training without any local setup. It's the most beginner-accessible option.

Upload your images through the browser, configure basic settings (trigger word, training steps, LoRA rank), and Civitai runs the training on their infrastructure. The resulting LoRA is automatically added to your Civitai profile where you can share it publicly or keep it private.

Limitations: Civitai's trainer currently focuses on SDXL, not Flux. If you need Flux LoRAs, use Replicate or Fal.ai. Civitai shines for people who want to share their trained models with the community and don't need the output for API integration.

Common problems and how to fix them

Overfit model (generates the training images too literally): Reduce training steps. If you trained for 2000 steps and the model reproduces backgrounds from your training set, try 1000 or even 800.

Underfit model (trigger word doesn't work reliably): Increase training steps or raise the network rank. For Flux, a rank of 32 tends to produce more reliable concept learning than rank 8-16.

Style bleed (subject LoRA changes image style): Your captions aren't specific enough. If your captions don't describe the visual style of each image, the model may learn style alongside subject. Rewrite captions to explicitly describe what's happening in each image.

Bad anatomy on Flux LoRA: Flux is generally strong on anatomy. If you're seeing deformations, check that your training images don't include any bad anatomy, the model can learn errors as well as correct features.

LoRA too large: Reduce network rank. Rank 4-8 produces small files (20-50MB) that load faster and work well for simple concepts. Save rank 32-64 for complex style learning.

Testing and using your trained model

Once you have a working LoRA, use it through ComfyUI (for local Flux workflows), AUTOMATIC1111 (for SDXL), the Fal.ai API, or Replicate's inference API.

For quality evaluation, generate 20-30 images across varied prompts that include your trigger word. Check for:

Consistency: does the subject look the same across different backgrounds and prompts?
Bleed: does the LoRA affect image quality when the trigger word isn't present?
Prompt adherence: can you still control pose, lighting, and style independently from the subject?

For building custom image generation workflows that expose your trained model through an API or UI, see the AI agent deployment best practices guide. Tools like Fal.ai and Krea AI also support loading custom LoRAs into their interfaces, which is useful for sharing trained models with clients or collaborators without requiring them to manage local tools.