Multiple Requests at once quantized models
#3
by
bilal-munirr
- opened
How to make multiple requests to the model, I'm using the Flask to build an api but whenever a new user hits the prompt, the api gets closed. I have searched the web but it is stating the issue is with llama-cpp-python. Is there any alternative to this?
Use the semaphore mechanism and/or the queue mechanism.