File size: 3,414 Bytes
a1c24ba 9b242cd 1a52df8 9b242cd 1a52df8 9b242cd 50068ac 9b242cd 3f60eda 9b242cd 1a52df8 9b242cd 1a52df8 9b242cd 1a52df8 9b242cd 1a52df8 9b242cd adbf61c 9b242cd 1a52df8 9b242cd 1a52df8 9b242cd 1a52df8 9b242cd 3f60eda 1a52df8 9b242cd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
title: TransGPT-7b
emoji: 📚
colorFrom: gray
colorTo: red
language:
- zh
tags:
- chatglm
- pytorch
- zh
- Text2Text-Generation
license: "other"
widget:
- text: "我想了解如何申请和更新驾驶证?"
---
# TransGPT
**发布中文TransGPT(7B)模型**
test case:
|input_text|predict|
|:-- |:--- |
|我想了解如何申请和更新驾驶证?|你可以到当地的交通管理部门或者公安局办理相关手续。具体流程可以在官方网站上查询。|
# 文件校验
```
md5sum ./*
```
```
e618653f90f163928316858e95bd54d1 ./config.json
b1eb3650cbc84466fed263a9f0dff5e2 ./generation_config.json
570159d90b39554713e9702b9107928a ./pytorch_model-00001-of-00002.bin
8788671a726d25b192134909fb825e0b ./pytorch_model-00002-of-00002.bin
604e0ba32b2cb7df8d8a3d13bddc93fe ./pytorch_model.bin.index.json
413c7f9a8a6517c52c937eed27f18847 ./special_tokens_map.json
2ba2be903e87d7471bbc413e041e70e8 ./tokenizer_config.json
39afcc4541e7931ef0d561ac6e216586 ./tokenizer.model
```
## Usage
First, you pass your input through the transformer model, then you get the generated sentence.
Install package:
```
pip install sentencepiece
pip install transformers>=4.28.0
```
```python
import torch
import transformers
from transformers import LlamaTokenizer, LlamaForCausalLM
def generate_prompt(text):
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{text}
### Response:"""
checkpoint="DUOMO-Lab/TransGPT-v0"
tokenizer = LlamaTokenizer.from_pretrained(checkpoint)
model = LlamaForCausalLM.from_pretrained(checkpoint).half().cuda()
model.eval()
text = '我想了解如何申请和更新驾驶证?'
prompt = generate_prompt(text)
input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')
with torch.no_grad():
output_ids = model.generate(
input_ids=input_ids,
max_new_tokens=1024,
temperature=1,
top_k=20,
top_p=0.9,
repetition_penalty=1.15
).cuda()
output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(output.replace(text, '').strip())
```
output:
```shell
我想了解如何申请和更新驾驶证?
```
## 模型来源
release合并后的模型权重。
HuggingFace版本权重(.bin文件)可用于:
- 使用Transformers进行训练和推理
- 使用text-generation-webui搭建界面
PyTorch版本权重(.pth文件)可用于:
- 使用llama.cpp工具进行量化和部署
模型文件组成:
```
TransGPT
config.json
generation_config.json
pytorch_model-00001-of-00002.bin
pytorch_model-00002-of-00002.bin
pytorch_model.bin.index.json
special_tokens_map.json
tokenizer.json
tokenizer.model
tokenizer_config.json
```
硬件要求:14G显存
### 微调数据集
1. ~34.6万条文本数据集(用于领域内预训练):[DUOMO-Lab/TransGPT-pt](https://huggingface.co/datasets/DUOMO-Lab/TransGPT-pt)
2. ~5.6万条对话数据(用于微调):[finetune_data](https://huggingface.co/data/finetune)
如果需要训练LLaMA模型,请参考[https://github.com/DUOMO/TransGPT](https://github.com/DUOMO/TransGPT)
## Citation
```latex
@software{TransGPT,
author = {Wang Peng},
title = {DUOMO/TransGPT},
year = {2023},
url = {https://github.com/DUOMO/TransGPT},
}
```
## Reference
- https://github.com/shibing624/textgen |