|
--- |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- LLM |
|
- Universal-NER |
|
- NER |
|
- 4bit |
|
inference: false |
|
--- |
|
![image](qunatized_lama_color_letters_4bit_512px.png) |
|
|
|
# Quantized version of Universal-NER/UniNER-7B-definition |
|
|
|
[Universal-NER/UniNER-7B-definition](https://huggingface.co/Universal-NER/UniNER-7B-definition) quantized to 4bit with GPTQ and stored with 1GB shard size. |
|
|
|
## Model Description |
|
|
|
The model [Universal-NER/UniNER-7B-definition](https://huggingface.co/Universal-NER/UniNER-7B-definition) was quantized to 4bit, group_size 128, and act-order=True with auto-gptq integration in transformers (https://huggingface.co/blog/gptq-integration). |
|
|
|
## Evaluation |
|
TODO |
|
|
|
## Prompt template |
|
|
|
Prompt template is the same as for the full precision model: |
|
|
|
```python |
|
prompt_template = """A virtual assistant answers questions from a user based on the provided text. |
|
USER: Text: {input_text} |
|
ASSISTANT: I’ve read this text. |
|
USER: What describes {entity_name} in the text? |
|
ASSISTANT: |
|
""" |
|
``` |
|
|
|
## Usage |
|
|
|
It is recommended to format input according to the prompt template mentioned above during inference for best results. |
|
|
|
```python |
|
prompt = prompt_template.format_map({"input_text": "Cologne is a great city in Germany - maybe even the greatest ;)", "entity_name": "city"}) |
|
``` |
|
|
|
The model is small enough to be loaded in free-tier Colab with a T4 GPU: https://gist.github.com/sebastianschramm/b849c06676c6601d9a87270e83f5a157 |
|
|
|
## License |
|
The original full precision model and its associated data are released under the CC BY-NC 4.0 license. Hence, the same license applies for the 4bit version. |