Add to HF Inference APIs
Hi,
It would be really useful to be able to use this through the Hugging Face Inference APIs (which would require this model to be compatible with Transformers). Are there any plans to add Transformers support to the model?
Thanks!
cc @reach-vb
Second this. Please have this model transformer-fied. I would like to release gptq quant for this but need a hf transformer compatible model.
Someone made the Tokenizer Hugging Face compatible but not sure what this helps if the weights itself are only available in the NeMo format: https://huggingface.co/Xenova/Nemotron-4-340B-Instruct-Tokenizer
Working on this here: https://huggingface.co/failspy/Nemotron-4-340B-Instruct-SafeTensors
Lacking a HF Transformers class for it as of now -- still working on that part if anyone wants to help, but the weights are ported to be similar to Llama-3's arch (though not perfect, for example QKV proj is not split), and plausible hypothetical config.json. Also includes the tokenizer from @Xenova
+1
Hi all -- regarding inference APIs, you can use the model on https://build.nvidia.com/nvidia/nemotron-4-340b-instruct. There's an interactive widget there as well as an API you can use.
@nealv +1 to fp8, as 8xA100 nodes are much more readily available than 16x at this time.
There's now a paid bounty for this to get closed ASAP. $175 and growing.
https://x.com/natolambert/status/1814735390877884823