SebastianSchramm
/

UniNER-7B-definition-GPTQ-4bit-128g-actorder_True

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

UniNER-7B-definition-GPTQ-4bit-128g-actorder_True / README.md

SebastianSchramm's picture

SebastianSchramm

Update README.md

dfe5db1 over 1 year ago

|

1.63 kB

	---
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- LLM
	- Universal-NER
	- NER
	- 4bit
	inference: false
	---
	![image](qunatized_lama_color_letters_4bit_512px.png)

	# Quantized version of Universal-NER/UniNER-7B-definition

	[Universal-NER/UniNER-7B-definition](https://huggingface.co/Universal-NER/UniNER-7B-definition) quantized to 4bit with GPTQ and stored with 1GB shard size.

	## Model Description

	The model [Universal-NER/UniNER-7B-definition](https://huggingface.co/Universal-NER/UniNER-7B-definition) was quantized to 4bit, group_size 128, and act-order=True with auto-gptq integration in transformers (https://huggingface.co/blog/gptq-integration).

	## Evaluation
	TODO

	## Prompt template

	Prompt template is the same as for the full precision model:

	```python
	prompt_template = """A virtual assistant answers questions from a user based on the provided text.
	USER: Text: {input_text}
	ASSISTANT: I’ve read this text.
	USER: What describes {entity_name} in the text?
	ASSISTANT:
	"""
	```

	## Usage

	It is recommended to format input according to the prompt template mentioned above during inference for best results.

	```python
	prompt = prompt_template.format_map({"input_text": "Cologne is a great city in Germany - maybe even the greatest ;)", "entity_name": "city"})
	```

	The model is small enough to be loaded in free-tier Colab with a T4 GPU: https://gist.github.com/sebastianschramm/b849c06676c6601d9a87270e83f5a157

	## License
	The original full precision model and its associated data are released under the CC BY-NC 4.0 license. Hence, the same license applies for the 4bit version.