Experiment: High Accuracy Training

Utilizing Kohya_ss GUI I’m experimenting with various settings sourced from different tutorials in the hopes of gathering a basic configuration that can generate accurate renders while using as little time and VRAM as possible during training.

Methodology:

1.) To quickly iterate through configurations I’m using a small dataset of 5 images, all with captions, to ensure model training happens in approximately and hour or less.

2.) To control the experiment I’ll be adjusting several training settings:

Number of Repeats
Batch Size
LR Scheduler
Max Resolution
Max Token Length
Regularization Images ( from model vs from dataset )
Learning Rate
Offset Noise
Prior Loss Weight
Memory Efficient Attention

3.) Other settings will remain constant through the experiment, these settings include LR Scheduler set to cosine with repeats, optimizer set to Lion, CLIP skip set to 2, and optimization settings such as Gradient Checkpointing and Shuffle Tags enabled. These settings have been set because thus far they’re the most efficient settings for low VRAM training.

Note: Xformers has been disabled based on the fact they make the model non-deterministic and would thus reduce accuracy of images rendered.

4.) In order to gauge success the trainer will generate 10 sample renders using a prompt specific to the dataset. Also at each render interval the training speed ( it/s ), time lapsed, and loss rate will be recorded for each experiment. These measurements will elucidate the impacts of the various control settings on training.

Initial Settings:

The following will be the base settings assumed to be good based on current research:

Repeats: 10
Batch Size: 1
Epochs: 1
Resolution: 512, 512
Token Length: 75
Regularization: 0
LR: 1e-5
Offset Noise: 0
Prior Loss Weight: 1.0
Memory Efficient Attention: Off
LR Scheduler: Constant

Equipment:

These experiments are being done on my local machine using an RTX 4090 and Ryzen 5900X with 32Gb of DDR4 RAM.

Each experiment will be a separate post on this blog complete with data and observations. The conclusion will also be here with the optimal training settings discovered.