AlpachinoNLP
/

Baichuan-13B-Instruction

@@ -15,20 +15,60 @@ inference: false
 Baichuan-13B-Instruction 为 Baichuan-13B 系列模型进行指令微调后的版本，预训练模型可见 [Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)。
-## 使用方式
-如下是一个使用Baichuan-13B-Chat进行对话的示例，正确输出为"乔戈里峰。世界第二高峰———乔戈里峰西方登山者称其为k2峰，海拔高度是8611米，位于喀喇昆仑山脉的中巴边境上"
 ```python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from transformers.generation.utils import GenerationConfig
-tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", use_fast=False, trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
-model.generation_config = GenerationConfig.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction")
-messages = []
-messages.append({"role": "Human", "content": "世界上第二高的山峰是哪座"})
-response = model.chat(tokenizer, messages)
-print(response)
 ```
 ## 量化部署

 Baichuan-13B-Instruction 为 Baichuan-13B 系列模型进行指令微调后的版本，预训练模型可见 [Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)。
+## Demo
+如下是一个使用 gradio 的模型 demo"
 ```python
+import gradio as gr
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction",trust_remote_code=True,use_fast=False)
+model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction",trust_remote_code=True ).half()
+model.cuda()
+def generate(histories,  max_new_tokens=2048, do_sample = True, top_p = 0.95, temperature = 0.35, repetition_penalty=1.1):
+    prompt = ""
+    for history in histories:
+        history_with_identity = "\nHuman:" + history[0] + "\n\nAssistant:" + history[1]
+        prompt += history_with_identity
+    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
+    outputs = model.generate(
+                    input_ids = input_ids,
+                    max_new_tokens=max_new_tokens,
+                    early_stopping=True,
+                    do_sample=do_sample,
+                    top_p=top_p,
+                    temperature=temperature,
+                    repetition_penalty=repetition_penalty,
+        )
+    rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
+    generate_text = rets[0].replace(prompt, "")
+    return generate_text
+with gr.Blocks() as demo:
+    chatbot = gr.Chatbot()
+    msg = gr.Textbox()
+    clear = gr.Button("clear")
+    def user(user_message, history):
+        return "", history + [[user_message, ""]]
+    def bot(history):
+        print(history)
+        bot_message = generate(history)
+        history[-1][1] = bot_message
+        return history
+    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
+        bot, chatbot, chatbot
+    )
+    clear.click(lambda: None, None, chatbot, queue=False)
+if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0")
 ```
 ## 量化部署