The Command Line
Below is a list of all the available commands π€ Accelerate with their parameters
accelerate config
Command:
accelerate config
or accelerate-config
Launches a series of prompts to create and save a default_config.yml
configuration file for your training system. Should
always be ran first on your machine.
Usage:
accelerate config [arguments]
Optional Arguments:
--config_file CONFIG_FILE
(str
) β The path to use to store the config file. Will default to a file named default_config.yaml in the cache location, which is the content of the environmentHF_HOME
suffixed with βaccelerateβ, or if you donβt have such an environment variable, your cache directory (~/.cache
or the content ofXDG_CACHE_HOME
) suffixed withhuggingface
.-h
,--help
(bool
) β Show a help message and exit
accelerate config default
Command:
accelerate config default
or accelerate-config default
Create a default config file for Accelerate with only a few flags set.
Usage:
accelerate config default [arguments]
Optional Arguments:
--config_file CONFIG_FILE
(str
) β The path to use to store the config file. Will default to a file named default_config.yaml in the cache location, which is the content of the environmentHF_HOME
suffixed with βaccelerateβ, or if you donβt have such an environment variable, your cache directory (~/.cache
or the content ofXDG_CACHE_HOME
) suffixed withhuggingface
.-h
,--help
(bool
) β Show a help message and exit--mixed_precision {no,fp16,bf16}
(str
) β Whether or not to use mixed precision training. Choose between FP16 and BF16 (bfloat16) training. BF16 training is only supported on Nvidia Ampere GPUs and PyTorch 1.10 or later.
accelerate config update
Command:
accelerate config update
or accelerate-config update
Update an existing config file with the latest defaults while maintaining the old configuration.
Usage:
accelerate config update [arguments]
Optional Arguments:
--config_file CONFIG_FILE
(str
) β The path to the config file to update. Will default to a file named default_config.yaml in the cache location, which is the content of the environmentHF_HOME
suffixed with βaccelerateβ, or if you donβt have such an environment variable, your cache directory (~/.cache
or the content ofXDG_CACHE_HOME
) suffixed withhuggingface
.-h
,--help
(bool
) β Show a help message and exit
accelerate env
Command:
accelerate env
or accelerate-env
or python -m accelerate.commands.env
Lists the contents of the passed π€ Accelerate configuration file. Should always be used when opening an issue on the GitHub repository.
Usage:
accelerate env [arguments]
Optional Arguments:
--config_file CONFIG_FILE
(str
) β The path to use to store the config file. Will default to a file named default_config.yaml in the cache location, which is the content of the environmentHF_HOME
suffixed with βaccelerateβ, or if you donβt have such an environment variable, your cache directory (~/.cache
or the content ofXDG_CACHE_HOME
) suffixed withhuggingface
.-h
,--help
(bool
) β Show a help message and exit
accelerate launch
Command:
accelerate launch
or accelerate-launch
or python -m accelerate.commands.launch
Launches a specified script on a distributed system with the right parameters.
Usage:
accelerate launch [arguments] {training_script} --{training_script-argument-1} --{training_script-argument-2} ...
Positional Arguments:
{training_script}
β The full path to the script to be launched in parallel--{training_script-argument-1}
β Arguments of the training script
Optional Arguments:
-h
,--help
(bool
) β Show a help message and exit--config_file CONFIG_FILE
(str
)β The config file to use for the default values in the launching script.-m
,--module
(bool
) β Change each process to interpret the launch script as a Python module, executing with the same behavior as βpython -mβ.--no_python
(bool
) β Skip prepending the training script with βpythonβ - just execute it directly. Useful when the script is not a Python script.--debug
(bool
) β Whether to print out the torch.distributed stack trace when something fails.-q
,--quiet
(bool
) β Silence subprocess errors from the launch stack trace to only show the relevant tracebacks. (Only applicable to DeepSpeed and single-process configurations).
The rest of these arguments are configured through accelerate config
and are read in from the specified --config_file
(or default configuration) for their
values. They can also be passed in manually.
Hardware Selection Arguments:
--cpu
(bool
) β Whether or not to force the training on the CPU.--multi_gpu
(bool
) β Whether or not this should launch a distributed GPU training.--tpu
(bool
) β Whether or not this should launch a TPU training.--ipex
(bool
) β Whether or not this should launch an Intel Pytorch Extension (IPEX) training.
Resource Selection Arguments:
The following arguments are useful for fine-tuning how available hardware should be used
--mixed_precision {no,fp16,bf16,fp8}
(str
) β Whether or not to use mixed precision training. Choose between FP16 and BF16 (bfloat16) training. BF16 training is only supported on Nvidia Ampere GPUs and PyTorch 1.10 or later.--num_processes NUM_PROCESSES
(int
) β The total number of processes to be launched in parallel.--num_machines NUM_MACHINES
(int
) β The total number of machines used in this training.--num_cpu_threads_per_process NUM_CPU_THREADS_PER_PROCESS
(int
) β The number of CPU threads per process. Can be tuned for optimal performance.--enable_cpu_affinity
(bool
) β Whether or not CPU affinity and balancing should be enabled. Currently only supported on NVIDIA hardware.
Training Paradigm Arguments:
The following arguments are useful for selecting which training paradigm to use.
--use_deepspeed
(bool
) β Whether or not to use DeepSpeed for training.--use_fsdp
(bool
) β Whether or not to use FullyShardedDataParallel for training.--use_megatron_lm
(bool
) β Whether or not to use Megatron-LM for training.--use_xpu
(bool
) β Whether to use IPEX plugin to speed up training on XPU specifically.
Distributed GPU Arguments:
The following arguments are only useful when multi_gpu
is passed or multi-gpu training is configured through accelerate config
:
--gpu_ids
(str
) β What GPUs (by id) should be used for training on this machine as a comma-seperated list--same_network
(bool
) β Whether all machines used for multinode training exist on the same local network.--machine_rank
(int
) β The rank of the machine on which this script is launched.--main_process_ip
(str
) β The IP address of the machine of rank 0.--main_process_port
(int
) β The port to use to communicate with the machine of rank 0.-t
,--tee
(str
) β Tee std streams into a log file and also to console.--log_dir
(str
) β Base directory to use for log files when using torchrun/torch.distributed.run as launcher. Use with βtee to redirect std streams info log files.--role
(str
) β User-defined role for the workers.--rdzv_backend
(str
) β The rendezvous method to use, such as βstaticβ (the default) or βc10dβ--rdzv_conf
(str
) β Additional rendezvous configuration (<key1>=<value1>,<key2>=<value2>,β¦).--max_restarts
(int
) β Maximum number of worker group restarts before failing.--monitor_interval
(int
) β Interval, in seconds, to monitor the state of workers.
TPU Arguments:
The following arguments are only useful when tpu
is passed or TPU training is configured through accelerate config
:
--tpu_cluster
(bool
) β Whether to use a GCP TPU pod for training.--tpu_use_sudo
(bool
) β Whether to usesudo
when running the TPU training script in each pod.--vm
(str
) β List of single Compute VM instance names. If not provided we assume usage of instance groups. For TPU pods.--env
(str
) β List of environment variables to set on the Compute VM instances. For TPU pods.--main_training_function
(str
) β The name of the main function to be executed in your script (only for TPU training).--downcast_bf16
(bool
) β Whether when using bf16 precision on TPUs if both float and double tensors are cast to bfloat16 or if double tensors remain as float32.
DeepSpeed Arguments:
The following arguments are only useful when use_deepspeed
is passed or deepspeed
is configured through accelerate config
:
--deepspeed_config_file
(str
) β DeepSpeed config file.--zero_stage
(int
) β DeepSpeedβs ZeRO optimization stage.--offload_optimizer_device
(str
) β Decides where (none|cpu|nvme) to offload optimizer states.--offload_param_device
(str
) β Decides where (none|cpu|nvme) to offload parameters.--offload_optimizer_nvme_path
(str
) β Decides Nvme Path to offload optimizer states.--gradient_accumulation_steps
(int
) β No of gradient_accumulation_steps used in your training script.--gradient_clipping
(float
) β Gradient clipping value used in your training script.--zero3_init_flag
(str
) β Decides Whether (true|false) to enabledeepspeed.zero.Init
for constructing massive models. Only applicable with DeepSpeed ZeRO Stage-3.--zero3_save_16bit_model
(str
) β Decides Whether (true|false) to save 16-bit model weights when using ZeRO Stage-3. Only applicable with DeepSpeed ZeRO Stage-3.--deepspeed_hostfile
(str
) β DeepSpeed hostfile for configuring multi-node compute resources.--deepspeed_exclusion_filter
(str
) β DeepSpeed exclusion filter string when using mutli-node setup.--deepspeed_inclusion_filter
(str
) β DeepSpeed inclusion filter string when using mutli-node setup.--deepspeed_multinode_launcher
(str
) β DeepSpeed multi-node launcher to use.--deepspeed_moe_layer_cls_names
(str
) β comma-separated list of transformer MoE layer class names (case-sensitive) to wrap, e.g,MixtralSparseMoeBlock
Qwen2MoeSparseMoeBlock
,JetMoEAttention,JetMoEBlock
Fully Sharded Data Parallelism Arguments:
The following arguments are only useful when use_fsdp
is passed or Fully Sharded Data Parallelism is configured through accelerate config
:
--fsdp_offload_params
(str
) β Decides Whether (true|false) to offload parameters and gradients to CPU.--fsdp_min_num_params
(int
) β FSDPβs minimum number of parameters for Default Auto Wrapping.--fsdp_sharding_strategy
(int
) β FSDPβs Sharding Strategy.--fsdp_auto_wrap_policy
(str
) β FSDPβs auto wrap policy.--fsdp_transformer_layer_cls_to_wrap
(str
) β Transformer layer class name (case-sensitive) to wrap, e.g,BertLayer
,GPTJBlock
,T5Block
β¦--fsdp_backward_prefetch_policy
(str
) β FSDPβs backward prefetch policy.--fsdp_state_dict_type
(str
) β FSDPβs state dict type.--fsdp_forward_prefetch
(str
) β FSDP forward prefetch.--fsdp_use_orig_params
(str
) β If True, allows non-uniformrequires_grad
mixed in a FSDP unit.--fsdp_cpu_ram_efficient_loading
(str
) β If true, only the first process loads the pretrained model checkoint while all other processes have empty weights. When using this,--fsdp_sync_module_states
needs to True.--fsdp_sync_module_states
(str
) β If true, each individually wrapped FSDP unit will broadcast module parameters from rank 0.--fsdp_activation_checkpointing
(bool
) β Decides Whether intermediate activations are freed during the forward pass, and a checkpoint is left as a placeholder
Megatron-LM Arguments:
The following arguments are only useful when use_megatron_lm
is passed or Megatron-LM is configured through accelerate config
:
--megatron_lm_tp_degree
(β) β Megatron-LMβs Tensor Parallelism (TP) degree.--megatron_lm_pp_degree
(β) β Megatron-LMβs Pipeline Parallelism (PP) degree.--megatron_lm_num_micro_batches
(β) β Megatron-LMβs number of micro batches when PP degree > 1.--megatron_lm_sequence_parallelism
(β) β Decides Whether (true|false) to enable Sequence Parallelism when TP degree > 1.--megatron_lm_recompute_activations
(β) β Decides Whether (true|false) to enable Selective Activation Recomputation.--megatron_lm_use_distributed_optimizer
(β) β Decides Whether (true|false) to use distributed optimizer which shards optimizer state and gradients across Data Parallel (DP) ranks.--megatron_lm_gradient_clipping
(β) β Megatron-LMβs gradient clipping value based on global L2 Norm (0 to disable).
FP8 Arguments:
--fp8_backend
(str
) β Choose a backend to train with FP8 (te
ormsamp
)--fp8_use_autocast_during_eval
(bool
) β Whether to use FP8 autocast during eval mode (useful only when--fp8_backend=te
is passed). Generally better metrics are found when this is not passed.--fp8_margin
(int
) β The margin to use for the gradient scaling (useful only when--fp8_backend=te
is passed).--fp8_interval
(int
) β The interval to use for how often the scaling factor is recomputed (useful only when--fp8_backend=te
is passed).--fp8_format
(str
) β The format to use for the FP8 recipe (useful only when--fp8_backend=te
is passed).--fp8_amax_history_len
(int
) β The length of the history to use for the scaling factor computation (useful only when--fp8_backend=te
is passed).--fp8_amax_compute_algo
(str
) β The algorithm to use for the scaling factor computation. (useful only when--fp8_backend=te
is passed).--fp8_override_linear_precision
(Tuple[bool, bool, bool]
) β Whether or not to executefprop
,dgrad
, andwgrad
GEMMS in higher precision.--fp8_opt_level
(str
) β What level of 8-bit collective communication should be used with MS-AMP (useful only when--fp8_backend=msamp
is passed)
AWS SageMaker Arguments:
The following arguments are only useful when training in SageMaker
--aws_access_key_id AWS_ACCESS_KEY_ID
(str
) β The AWS_ACCESS_KEY_ID used to launch the Amazon SageMaker training job--aws_secret_access_key AWS_SECRET_ACCESS_KEY
(str
) β The AWS_SECRET_ACCESS_KEY used to launch the Amazon SageMaker training job
accelerate estimate-memory
Command:
accelerate estimate-memory
or accelerate-estimate-memory
or python -m accelerate.commands.estimate
Estimates the total vRAM a particular model hosted on the Hub needs to be loaded in with an estimate for training. Requires that huggingface_hub
be installed.
When performing inference, typically add β€20% to the result as overall allocation as referenced here. We will have more extensive estimations in the future that will automatically be included in the calculation.
Usage:
accelerate estimate-memory {MODEL_NAME} --library_name {LIBRARY_NAME} --dtypes {dtype_1} {dtype_2} ...
Required Arguments:
MODEL_NAME
(str
)β The model name on the Hugging Face Hub
Optional Arguments:
--library_name {timm,transformers}
(str
) β The library the model has an integration with, such astransformers
, needed only if this information is not stored on the Hub--dtypes {float32,float16,int8,int4}
([{float32,float16,int8,int4} ...]
) β The dtypes to use for the model, must be one (or many) offloat32
,float16
,int8
, andint4
--trust_remote_code
(bool
) β Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be passed for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
accelerate tpu-config
accelerate tpu-config
Usage:
accelerate tpu-config [arguments]
Optional Arguments:
-h
,--help
(bool
) β Show a help message and exit
Config Arguments:
Arguments that can be configured through accelerate config
.
--config_file
(str
) β Path to the config file to use for accelerate.--tpu_name
(str
) β The name of the TPU to use. If not specified, will use the TPU specified in the config file.--tpu_zone
(str
) β The zone of the TPU to use. If not specified, will use the zone specified in the config file.
TPU Arguments:
Arguments for options ran inside the TPU.
--command_file
(str
) β The path to the file containing the commands to run on the pod on startup.--command
(str
) β A command to run on the pod. Can be passed multiple times.--install_accelerate
(bool
) β Whether to install accelerate on the pod. Defaults to False.--accelerate_version
(str
) β The version of accelerate to install on the pod. If not specified, will use the latest pypi version. Specify βdevβ to install from GitHub.--debug
(bool
) β If set, will print the command that would be run instead of running it.
accelerate test
accelerate test
or accelerate-test
Runs accelerate/test_utils/test_script.py
to verify that π€ Accelerate has been properly configured on your system and runs.
Usage:
accelerate test [arguments]
Optional Arguments:
--config_file CONFIG_FILE
(str
) β The path to use to store the config file. Will default to a file named default_config.yaml in the cache location, which is the content of the environmentHF_HOME
suffixed with βaccelerateβ, or if you donβt have such an environment variable, your cache directory (~/.cache
or the content ofXDG_CACHE_HOME
) suffixed withhuggingface
.-h
,--help
(bool
) β Show a help message and exit