70B AWQ model?

#2
by Teja-Gollapudi - opened

Thanks for AWQ these models!
Do you plan on releasing a LLama-2-70B -chat-hf AWQ model soon?

Llama 2 70B (unlike 7B and 13B) uses a grouped-query attention mechanism (as opposed to multi-head attention). AWQ paper authors are working on adding support for GQA (link)

abhinavkulkarni changed discussion status to closed

Sign up or log in to comment