safetensors shards of 2GB
Hello,
Would it be possible to re-upload or create a branch with shards size of 2GB and safetensors?
I would have converted to safetensors myself, but 10GB shards are impossible to convert to safetensors format on average configuration (<20GB of CPU RAM), and subsequently the entire model is impossible to run too.
As an example, you can take a look at https://huggingface.co/waifu-workshop/pygmalion-6b/tree/main
This has two branches : the original with 10GB shards (cannot be run on low configuration), and the "sharded" branch, with 2GB safetensors shards, which can run on low end configuration.
2GB safetensors can be created by adding the following parameters to the save_pretrained function :
- max_shard_size="2GB"
- safe_serialization=True
as documented : https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/model#transformers.PreTrainedModel.save_pretrained.max_shard_size
Thank you
It's a bit weird that a 10BG shard would take 20GB of memory. Are you sure you're loading the model in fp16? Either way, yes I can upload a safetensor version no problem.
I followed tips from :
- https://huggingface.co/docs/transformers/main/en/main_classes/quantization
- https://huggingface.co/docs/accelerate/usage_guides/big_modeling
The machine has 16GB of RAM, 2GB swap, and 8GB VRAM + dozens of GB for offload_folder.
In the end, I used the following script :
from transformers import AutoModelForCausalLM
import torch
checkpoint = "stablelm_oa_7b"
model = AutoModelForCausalLM.from_pretrained(
checkpoint,
torch_dtype=torch.float16,
device_map="auto",
max_memory={0: "7GB", "cpu": "3GB"},
offload_folder="/offload"
)
model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True)
(For some reasons I have to set the CPU to 3GB or it would OOM)
The model is indeed loaded, but saving it fails with :
Traceback (most recent call last): File "/src/run_convert.py", line 24, in <module> model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True) File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1841, in save_pretrained safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"}) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 72, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 237, in _flatten return { File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 241, in <dictcomp> "data": _tobytes(v, k), File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 193, in _tobytes tensor = tensor.to("cpu") NotImplementedError: Cannot copy out of meta tensor; no data!
It could be a bug in safetensors library
I followed tips from :
- https://huggingface.co/docs/transformers/main/en/main_classes/quantization
- https://huggingface.co/docs/accelerate/usage_guides/big_modeling
The machine has 16GB of RAM, 2GB swap, and 8GB VRAM + dozens of GB for offload_folder.
In the end, I used the following script :
from transformers import AutoModelForCausalLM import torch checkpoint = "stablelm_oa_7b" model = AutoModelForCausalLM.from_pretrained( checkpoint, torch_dtype=torch.float16, device_map="auto", max_memory={0: "7GB", "cpu": "3GB"}, offload_folder="/offload" ) model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True)
(For some reasons I have to set the CPU to 3GB or it would OOM)
The model is indeed loaded, but saving it fails with :
Traceback (most recent call last): File "/src/run_convert.py", line 24, in <module> model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True) File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1841, in save_pretrained safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"}) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 72, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 237, in _flatten return { File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 241, in <dictcomp> "data": _tobytes(v, k), File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 193, in _tobytes tensor = tensor.to("cpu") NotImplementedError: Cannot copy out of meta tensor; no data!
It could be a bug in safetensors library
hey,
i am facing the same issue, did you come up with any solution?