Not sure about hyperparam `test-size` during fine-tuning
Hello,
I'm fine-tuning Dolly with my own data.
However, there's a test-size
hyperparameter, which I'm not sure about. I can't find mention of it in their repo or pypi.
# https://github.com/databrickslabs/dolly#a10-gpus-1
!deepspeed {num_gpus_flag} \
--module training.trainer \
--input-model {input_model} \
--deepspeed {deepspeed_config} \
--epochs 2 \
--local-output-dir {local_output_dir} \
--dbfs-output-dir {dbfs_output_dir} \
--per-device-train-batch-size 3 \
--per-device-eval-batch-size 3 \
--logging-steps 10 \
--save-steps 200 \
--save-total-limit 20 \
--eval-steps 50 \
--warmup-steps 50 \
--test-size 200 \
--lr 5e-6
My training set is 1,000 datapoints. What should the hyperparams, especially test-size
be to suit the training size?
Most of the actual training configuration is in the HF Trainer: https://github.com/databrickslabs/dolly/blob/master/training/trainer.py#L236
These arguments to deepspeed tell deepspeed about the training also, so sometimes it's a little repetitive. Here I guess deepspeed also wants to know how big the test set size is. It's possible it's actually redundant, I haven't looked closely.
Thanks for the pointers, Sean. It might very well be redundant.
Oh, wait I'm misreading this. --test-size
is how you pass the argument through deepspeed down to Trainer. It's not redundant. It's just saying how much of the dataset to hold out for eval.
With only 1000 data points the test size may not be very useful here. I would consider just setting it very low (like 1 or 10) and perhaps ignoring the eval loss. You could try running generation at different checkpoints to see how the quality actually looks.
Thanks. I see the dolly-15k dataset only has "train". Mine has "train" and "test". I suppose I have to put it all in "train", for a few more data points.
You can also just modify the code to load your train and test set, instead of randomly splitting test out of train
Deepspeed exited with -9 on a fine-tuning run against the 7b model. I guess it means OOM. (Using 4x A10 GPUs)
I'll look into trying Parameter-Efficient Tuning.
Or perhaps try with 8x A100 (p4d instance).
What model size are you using and what instance? that should not be needed. https://github.com/databrickslabs/dolly#a10-gpus
I'm trying to fine-tune the 7b model with learning rate 5e-8
(to not clobber the weights too much) and number of epochs 2.
Here's the log.
I use 4 x A10G GPUs:
Python 3.9.5
torch: 1.13 ; cuda: cu117
Mon May 1 12:23:15 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10G Off | 00000000:00:1B.0 Off | 0 |
| 0% 19C P8 9W / 300W | 0MiB / 22731MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A10G Off | 00000000:00:1C.0 Off | 0 |
| 0% 18C P8 9W / 300W | 0MiB / 22731MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A10G Off | 00000000:00:1D.0 Off | 0 |
| 0% 19C P8 9W / 300W | 0MiB / 22731MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 |
| 0% 19C P8 8W / 300W | 0MiB / 22731MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
The instance:
Learning rate is not related to mem usage. Are you following the instructions in the repo for changing the code when using A10s? it will not work out of the box unless you modify some settings as described there.
Yes, I'm following this exactly: https://github.com/databrickslabs/dolly#a10-gpus-1
Oh, you have a g5.12xlarge. Try g5.24xlarge. I think that's not enough RAM to load the model 4x into memory
Ah, sorry - yes, that worked, and it took 5 hours to fine-tune my model. Thanks for your help!
Hello opyate!
I'm also looking to fine tune dolly on ec2 g5.24xlarge instance.
!deepspeed {num_gpus_flag}
--module training.trainer
--input-model {input_model}
--deepspeed {deepspeed_config}
--epochs 2
--local-output-dir {local_output_dir}
--dbfs-output-dir {dbfs_output_dir}
--per-device-train-batch-size 3
--per-device-eval-batch-size 3
--logging-steps 10
--save-steps 200
--save-total-limit 20
--eval-steps 50
--warmup-steps 50
--test-size 200
--lr 5e-6
Did you run this command directly on terminal or was it part of another file? how are the values to variables in {} are passed?
I'm looking into tutorials for using deepspeed but not been able to crack it. It would be great if you share how you used deepspeed to fine tune Dolly!
One tutorial mentioned to run 'accelerate config' first and answer a bunch of questions. Is that the way to proceed?
Thanks,
Abhilash
Did you run this command directly on terminal or was it part of another file? how are the values to variables in {} are passed?
Hi, you can clone the dolly repo into Databricks, then open this notebook, and it's all there. Then just follow the extra guidance for A10 GPUs.
deepspeed docs: https://deepspeed.readthedocs.io/en/latest/
accelerate is a different library.
You already have a working example linked from this model card: https://github.com/databrickslabs/dolly
Many thanks both of you! I have been able to train the dolly-v2-3B model on the 15k dataset. It has reached epoch = 0.41 and I hope it doesn't get into any errors.
My original aim, though, was to fine tune the dolly-v2-3b on my custom data (Summarisation/Extraction). I have the data ready in csv format. I just have to adapt it to the jsonl format.
- The dolly 15k data has 4 fields - {"instruction": β", "context": "", "response": , "category": ""}. Is it okay to leave some of them blank?
- My data has many "\n" in it. Should I get rid of them?
Any other thing to look out for?
Thanks
Category is actually unused. Context can be blank, yes, you can see that in some entries. You can see how it turns the fields into a string with prompt here: https://github.com/databrickslabs/dolly/blob/master/training/trainer.py#L109 (You could even change the code to do whatever you want; in the end all you are feeding the model are strings)
Thanks!