NeMo
English
nvidia
steerlm
reward model

Running inference outside of triton

#6
by lbathen - opened

Hi, do you have sample code showing how to run the model outside of the triton environment?

NVIDIA org

Hi, we currently use triton only as an orchestrator to batch together client-side requests into batch sizes of 2 or larger (in our existing code). Therefore, triton is not strictly required as the inference backend uses nemo aligner/nemo. However, we don't have sample code to show this currently, even though it should be easy to implement. If you need more help, can you clarify what type of environment you're planning to run the model under?

I had an old version of Nemo aligner :) I see newer code that will help with this task. Thank you.

lbathen changed discussion status to closed

Sign up or log in to comment