qnemo file
#6
by
willy1212009
- opened
did anyone do the PTQ from nemo-framework to get nemotron-340b fp8/int4 qnemo file? it should use 16H100 or 8H200 to convert, but we dont have this equipment QQ.
but it's weird that we want use quantize but it need 16H100 first lol.
in paper, it show if use quantize, only need 8H100
https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/ptq.html
There's some quantization work in progress though not sure about int4. Will be shared once fully validated.