safetensors shards of 2GB

#1
by antplsdev - opened

Hello,

Would it be possible to re-upload or create a branch with shards size of 2GB and safetensors?

I would have converted to safetensors myself, but 10GB shards are impossible to convert to safetensors format on average configuration (<20GB of CPU RAM), and subsequently the entire model is impossible to run too.

As an example, you can take a look at https://huggingface.co/waifu-workshop/pygmalion-6b/tree/main
This has two branches : the original with 10GB shards (cannot be run on low configuration), and the "sharded" branch, with 2GB safetensors shards, which can run on low end configuration.

2GB safetensors can be created by adding the following parameters to the save_pretrained function :

  • max_shard_size="2GB"
  • safe_serialization=True

as documented : https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/model#transformers.PreTrainedModel.save_pretrained.max_shard_size

Thank you

OpenAssistant org

It's a bit weird that a 10BG shard would take 20GB of memory. Are you sure you're loading the model in fp16? Either way, yes I can upload a safetensor version no problem.

I followed tips from :

The machine has 16GB of RAM, 2GB swap, and 8GB VRAM + dozens of GB for offload_folder.

In the end, I used the following script :

from transformers import AutoModelForCausalLM
import torch

checkpoint = "stablelm_oa_7b"

model = AutoModelForCausalLM.from_pretrained(
checkpoint, 
torch_dtype=torch.float16,
device_map="auto",
max_memory={0: "7GB", "cpu": "3GB"},
offload_folder="/offload"
)

model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True)

(For some reasons I have to set the CPU to 3GB or it would OOM)

The model is indeed loaded, but saving it fails with :

Traceback (most recent call last): File "/src/run_convert.py", line 24, in <module> model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True) File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1841, in save_pretrained safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"}) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 72, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 237, in _flatten return { File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 241, in <dictcomp> "data": _tobytes(v, k), File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 193, in _tobytes tensor = tensor.to("cpu") NotImplementedError: Cannot copy out of meta tensor; no data!

It could be a bug in safetensors library

I followed tips from :

The machine has 16GB of RAM, 2GB swap, and 8GB VRAM + dozens of GB for offload_folder.

In the end, I used the following script :

from transformers import AutoModelForCausalLM
import torch

checkpoint = "stablelm_oa_7b"

model = AutoModelForCausalLM.from_pretrained(
checkpoint, 
torch_dtype=torch.float16,
device_map="auto",
max_memory={0: "7GB", "cpu": "3GB"},
offload_folder="/offload"
)

model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True)

(For some reasons I have to set the CPU to 3GB or it would OOM)

The model is indeed loaded, but saving it fails with :

Traceback (most recent call last): File "/src/run_convert.py", line 24, in <module> model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True) File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1841, in save_pretrained safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"}) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 72, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 237, in _flatten return { File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 241, in <dictcomp> "data": _tobytes(v, k), File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 193, in _tobytes tensor = tensor.to("cpu") NotImplementedError: Cannot copy out of meta tensor; no data!

It could be a bug in safetensors library

hey,
i am facing the same issue, did you come up with any solution?

antplsdev changed discussion status to closed

Sign up or log in to comment