sh2orc
/

Llama-3.1-Korean-8B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sh2orc commited on Aug 4

Commit

05774d3

•

1 Parent(s): 5f061ea

Update README.md

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -49,7 +49,21 @@ print(outputs[0]["generated_text"])
 ## 💻 Usage for VLLM
 ```python
 def gen(instruction):
     messages = [

 ## 💻 Usage for VLLM
+Use with transformers
+Starting with ```vllm``` onward, you can run conversational inference using the vLLM pipeline abstraction with the gen() function.
+Make sure to update your vllm installation via ```pip install --upgrade vllm.```
 ```python
+from vllm import LLM, SamplingParams
+from transformers import AutoTokenizer, pipeline
+BASE_MODEL = "sh2orc/Llama-3.1-Korean-8B-Instruct"
+llm = LLM(model=BASE_MODEL)
+tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
+tokenizer.pad_token = tokenizer.eos_token
+tokenizer.padding_side = 'right'
 def gen(instruction):
     messages = [