CUDA: Out of Memory Error while fine-tuning databricks/dolly-v2-3b

#22
by abhi24 - opened

My aim is to fine tune dolly-v2-3b on my own data.

I'm running on AWS instance g5.48xlarge. It has 8 A10G Tensor Core GPUs.
I have the following values set for the variables as advised in the official repo.

@click.option("--epochs", type=int, default=1, help="Number of epochs to train for.")
@click.option("--per-device-train-batch-size", type=int, default=3, help="Batch size to use for training.")
@click.option("--per-device-eval-batch-size", type=int, default=3, help="Batch size to use for evaluation.")
@click.option("--bf16", type=bool, default=False, help="Whether to use bf16 (preferred on A100's).")

I haven't made any changes to the config file.
This is the command I use to run the trainer -
python3 training/trainer.py --local-output-dir "/home/ubuntu/train_progress" --warmup-steps 0 --deepspeed "/home/ubuntu/dolly/config/ds_z3_bf16_config.json"

After all these also, I'm running into CUDA: Out of memory error. Am I missing something?

Please advise.

Databricks org

You aren't using deepspeed. That's one major difference and issue

srowen changed discussion status to closed

Sign up or log in to comment