This directory contains the training-related specifications for all the models we support in finetrainers
. Each model page has:
- an example training command
- inference example
- numbers on memory consumption
By default, we don't include any validation-related arguments in the example training commands. To enable validation inference every 500 steps, one can add the following arguments:
+ --validation_dataset_file <Path to a CSV/JSON/PARQUET/ARROW> \
+ --validation_steps 500
Arguments for training are documented in the code. For more information, please run python train.py --help
.
The following table shows the algorithms supported for training and the models they are supported for:
Model | SFT | Control | ControlNet | Distillation |
---|---|---|---|---|
CogVideoX | 🤗 | 😡 | 😡 | 😡 |
LTX-Video | 🤗 | 😡 | 😡 | 😡 |
HunyuanVideo) | 🤗 | 😡 | 😡 | 😡 |
For launching SFT Training:
--training_type lora
: Trains a new set of low-rank weights of the model, yielding a smaller adapter model. Currently, only LoRA is supported from 🤗 PEFT--training_type full-finetune
: Trains the full-rank weights of the model, yielding a full-parameter trained model.
Any model architecture loadable in diffusers/transformers for above models can be used for training. For example, SkyReels-T2V is a finetune of HunyuanVideo, which is compatible for continual training out-of-the-box. Custom models can be loaded either by writing your own [ModelSpecification](TODO(aryan): add link) or by using the following set of arguments:
--tokenizer_id
,--tokenizer_2_id
,--tokenizer_3_id
: The tokenizers to use for training in conjunction with text encoder conditioning models.--text_encoder_id
,--text_encoder_2_id
,--text_encoder_3_id
: The text encoder conditioning models.--transformer_id
: The transformer model to use for training.--vae_id
: The VAE model to use for training.
The above arguments should take care of most training scenarios. For any custom training scenarios, please use your own implementation of a ModelSpecification
. These arguments should be used only if one wants to override the default components loaded from --pretrained_model_name_or_path
. Similar to each of these arguments, there exists a set of --<ARG>_dtype
argument to specify the precision of each component.
To resume training, the following arguments can be used:
--checkpointing_steps
: The interval of training steps that should be completed after which the training state should be saved.--checkpointing_limit
: The maximum number of checkpoints that should be saved at once. If the limit is reached, the oldest checkpoint is purged.--resume_from_checkpoint <STEP_OR_LATEST>
: Can be an integer or the string"latest"
. If an integer is provided, training will resume from that step if a checkpoint corresponding to it exists. If"latest"
is provided, training will resume from the latest checkpoint in the--output_dir
.
Important
The --resume_from_checkpoint
argument is only compatible if the parallel backend and degrees of parallelism are the same from the previous training run. For example, changing --dp_degree 2 --dp_shards 1
from past run to --dp_degree 1 --dp_shards 2
in current run will not work.
The accelerate config files (the ones seen here) that are being supplied while launching training should contain a field called mixed_precision
and accelerate
makes use of that if specified. We don't let users explicitly pass that from the CLI args because it can be confusing to have transformer_dtype
and mixed_precision
in the codebase.
transformer_dtype
is the ultimate source of truth for the precision to be used when training. It will also most likely always have to be torch.bfloat16
because:
- All models currently supported (except Cog-2b) do not work well in FP16 for inference, so training would be broken as well. This can be revisited if it makes sense to train in FP16 for other models added.
- The
accelerate
config files default to using "bf16", but modifying that would be at the risk of user and assumes they understand the significance of their changes.