Resource: Dreambooth Training README

– Github

README describing proper methods for constructing a dataset.

According to the README Regularization images allow for prior preservation of a models existing dataset.

Regularization images should be generated before training from the base model.
Image classification should be a generic term ( i.e. person, cat, dog, man, woman ).
Most experiments suggest 200 to 300 regularization images per class improve training accuracy.
In order to include classification in a data set directory naming convention is as follows: <number of repeats>_<data keyword> <class keyword>.

Different data keywords can be used for a single class allowing for varied training data within a class. For example training two sets of images of different characters.
Important Note: Previous experiments have shown that without regularization images, or using images not generated by a model, training data will overwrite a model’s data damaging the models ability to generate images other than the training data.