shibing624
/

vicuna-baichuan-13b-chat

Text Generation

Text2Text-Generation

Inference Endpoints

Model card Files Files and versions Community

shibing624 commited on Aug 8, 2023

Commit

ee443b3

•

1 Parent(s): 1fa90a2

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -80,7 +80,7 @@ The following `bitsandbytes` quantization config was used during training:
 - [shibing624/textgen](https://github.com/shibing624/textgen)
 - [shibing624/MedicalGPT](https://github.com/shibing624/MedicalGPT)
-使用textgen库：[textgen](https://github.com/shibing624/textgen)，可调用LLaMA模型：
 Install package:
 ```shell
@@ -114,10 +114,10 @@ import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
-model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map='auto', trust_remote_code=True)
 model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
-device = "cuda" if torch.cuda.is_available() else "cpu"
 def generate_prompt(instruction):
     return f"""A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.USER: {instruction} ASSISTANT: """
@@ -129,9 +129,9 @@ for s in sents:
     inputs = tokenizer(q, return_tensors="pt")
     inputs = inputs.to(device)
-    generate_ids = ref_model.generate(
         **inputs,
-        max_new_tokens=120,
     )
     output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True)[0]
@@ -164,7 +164,7 @@ vicuna-baichuan-13b-chat
 └── tokenizer.model
 ```
 ### Inference Examples

 - [shibing624/textgen](https://github.com/shibing624/textgen)
 - [shibing624/MedicalGPT](https://github.com/shibing624/MedicalGPT)
+使用textgen库：[textgen](https://github.com/shibing624/textgen)，可调用Baichuan/LLaMA模型：
 Install package:
 ```shell
 from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
+model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map='auto', torch_dtype=torch.float16, trust_remote_code=True)
 model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", trust_remote_code=True)
+device = torch.device(0) if torch.cuda.is_available() else torch.device("cpu")
 def generate_prompt(instruction):
     return f"""A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.USER: {instruction} ASSISTANT: """
     inputs = tokenizer(q, return_tensors="pt")
     inputs = inputs.to(device)
+    generate_ids = model.generate(
         **inputs,
+        max_new_tokens=512,
     )
     output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True)[0]
 └── tokenizer.model
 ```
+- Inference GPU: 27G
 ### Inference Examples