safetensors shards of 2GB

by antplsdev - opened Apr 22, 2023

Apr 22, 2023

Hello,

Would it be possible to re-upload or create a branch with shards size of 2GB and safetensors?

I would have converted to safetensors myself, but 10GB shards are impossible to convert to safetensors format on average configuration (<20GB of CPU RAM), and subsequently the entire model is impossible to run too.

As an example, you can take a look at https://huggingface.co/waifu-workshop/pygmalion-6b/tree/main
This has two branches : the original with 10GB shards (cannot be run on low configuration), and the "sharded" branch, with 2GB safetensors shards, which can run on low end configuration.

2GB safetensors can be created by adding the following parameters to the save_pretrained function :

max_shard_size="2GB"
safe_serialization=True

as documented : https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/model#transformers.PreTrainedModel.save_pretrained.max_shard_size

Thank you

dvruette

OpenAssistant org Apr 23, 2023

It's a bit weird that a 10BG shard would take 20GB of memory. Are you sure you're loading the model in fp16? Either way, yes I can upload a safetensor version no problem.

antplsdev

Apr 23, 2023

I followed tips from :

The machine has 16GB of RAM, 2GB swap, and 8GB VRAM + dozens of GB for offload_folder.

In the end, I used the following script :

from transformers import AutoModelForCausalLM
import torch

checkpoint = "stablelm_oa_7b"

model = AutoModelForCausalLM.from_pretrained(
checkpoint, 
torch_dtype=torch.float16,
device_map="auto",
max_memory={0: "7GB", "cpu": "3GB"},
offload_folder="/offload"
)

model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True)

(For some reasons I have to set the CPU to 3GB or it would OOM)

The model is indeed loaded, but saving it fails with :

Traceback (most recent call last): File "/src/run_convert.py", line 24, in <module> model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True) File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1841, in save_pretrained safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"}) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 72, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 237, in _flatten return { File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 241, in <dictcomp> "data": _tobytes(v, k), File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 193, in _tobytes tensor = tensor.to("cpu") NotImplementedError: Cannot copy out of meta tensor; no data!

It could be a bug in safetensors library

anujsahani01

May 27, 2023

I followed tips from :

https://huggingface.co/docs/transformers/main/en/main_classes/quantization

https://huggingface.co/docs/accelerate/usage_guides/big_modeling

The machine has 16GB of RAM, 2GB swap, and 8GB VRAM + dozens of GB for offload_folder.

In the end, I used the following script :
from transformers import AutoModelForCausalLM
import torch

checkpoint = "stablelm_oa_7b"

model = AutoModelForCausalLM.from_pretrained(
checkpoint, 
torch_dtype=torch.float16,
device_map="auto",
max_memory={0: "7GB", "cpu": "3GB"},
offload_folder="/offload"
)

model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True)
(For some reasons I have to set the CPU to 3GB or it would OOM)

The model is indeed loaded, but saving it fails with :

Traceback (most recent call last): File "/src/run_convert.py", line 24, in <module> model.save_pretrained("./sharded", max_shard_size="2GB", safe_serialization=True) File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1841, in save_pretrained safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"}) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 72, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 237, in _flatten return { File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 241, in <dictcomp> "data": _tobytes(v, k), File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 193, in _tobytes tensor = tensor.to("cpu") NotImplementedError: Cannot copy out of meta tensor; no data!

It could be a bug in safetensors library

hey,
i am facing the same issue, did you come up with any solution?

antplsdev changed discussion status to closed Apr 21, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment