Support Auto Device Map
I'm trying run the model on multiple GPU's use device_map="auto"
but I get MPTForCausalLM
does not support device_map="auto"
yet.
Same here
I took a pass at this on the storywriter
variant. can't promise it's perfect, but I added the main block to modeling_mpt.py
and let it support gradient checkpointing, seems to work fine. you can check out the commit/details here: https://huggingface.co/ethzanalytics/mpt-7b-storywriter-sharded/commit/0688e28bf6d9c7c0ee98a03628948f81eca2bdd6
also there's a colab demo on the model card if you want to test first etc https://huggingface.co/ethzanalytics/mpt-7b-storywriter-sharded
Is there a plan to add 'device_map' ?
@alex-laptiev it already works! that said, it works for single-GPU as I don't have a multi-GPU setup, so debugging the custom modeling code is tricky without that. Discussing that here
Also - @jprafael replicated what I did on the storywriter and made an instruct version, you can see that here: https://huggingface.co/jprafael/mpt-7b-instruct-sharded
hope that helps!
device_map
is now supported with this PR: https://huggingface.co/mosaicml/mpt-7b-instruct/discussions/41