Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
andrewrreed 
posted an update Feb 8, 2024
Post
🚀 It's now easier than ever to switch from OpenAI to open LLMs

Hugging Face's TGI now supports an OpenAI compatible Chat Completion API

This means you can transition code that uses OpenAI client libraries (or frameworks like LangChain 🦜 and LlamaIndex 🦙) to run open models by changing just two lines of code 🤗

⭐ Here's how:
from openai import OpenAI

# initialize the client but point it to TGI
client = OpenAI(
    base_url="<ENDPOINT_URL>" + "/v1/",  # replace with your endpoint url
    api_key="<HF_API_TOKEN>",  # replace with your token
)
chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Why is open-source software important?"},
    ],
    stream=True,
    max_tokens=500
)

# iterate and print stream
for message in chat_completion:
    print(message.choices[0].delta.content, end="")


🔗 Blog post ➡ https://huggingface.co/blog/tgi-messages-api
🔗 TGI docs ➡ https://huggingface.co/docs/text-generation-inference/en/messages_api

very cool! This is activated on a new /chat/completions endpoint in text-generation-inference (in addition to the existing endpoint), correct?

·

NICE! Does this apply to all models in serverless and deployed endpoints, or just models that have a correct chat_template in tokenizer_config.json?

·

The latter, just those with a chat template set