Running inference outside of triton

by lbathen - opened Jul 24, 2024

Discussion

lbathen

Jul 24, 2024

Hi, do you have sample code showing how to run the model outside of the triton environment?

zhilinw

NVIDIA org Jul 24, 2024

Hi, we currently use triton only as an orchestrator to batch together client-side requests into batch sizes of 2 or larger (in our existing code). Therefore, triton is not strictly required as the inference backend uses nemo aligner/nemo. However, we don't have sample code to show this currently, even though it should be easy to implement. If you need more help, can you clarify what type of environment you're planning to run the model under?

lbathen

Jul 25, 2024

I had an old version of Nemo aligner :) I see newer code that will help with this task. Thank you.

lbathen changed discussion status to closed Jul 25, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment