run fail in my macos M3
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/bytedance/Public/Code/DragonBall/Octopus-v2/test.py", line 20, in
model = GemmaForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3531, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3958, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/transformers/modeling_utils.py", line 812, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/opt/homebrew/lib/python3.12/site-packages/accelerate/utils/modeling.py", line 399, in set_module_tensor_to_device
new_value = value.to(device)
^^^^^^^^^^^^^^^^
TypeError: BFloat16 is not supported on MPS
Try to change torch_dtype to Float32.
We will provide solution to these need in the future. Sorry, we will work harder to accelerate this.
This helps, thanks @twhongyujiang .
I know get this error:
NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Which is sad...
@Circle9
We have GGUF converter (https://huggingface.co/spaces/NexaAIDev/gguf-convertor) and please try to convert safetensors to GGUF and use Ollama to run, we have an example for Octopus-V4 in GGUF:
https://huggingface.co/NexaAIDev/octopus-v4-gguf