File size: 2,465 Bytes
cbce5ca 891e49a cbce5ca 576c997 4f19d31 576c997 cbce5ca 576c997 cbce5ca 576c997 cd59429 4f19d31 cd59429 4f19d31 cd59429 cbce5ca 891e49a cbce5ca 891e49a cbce5ca 891e49a cbce5ca 891e49a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
license: llama2
language:
- en
metrics:
- perplexity
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
Model Developers Ranjanunicode
A quantized version of Llama 2 chat hugging face that can be implemented with minimal hardware requirements.
Input Models input text only.
Output Models generate text only.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Ranjan Pandit
- **Model type:** Quantized version of "meta-llama/Llama-2-7b-chat-hf"
- **Finetuned from model [optional]:** "meta-llama/Llama-2-7b-chat-hf"
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** "https://arxiv.org/abs/2310.19102"
## Uses
- Intended Use Cases unicode-llama-2-chat-Hf-q4-2 is intended for commercial and research use in English.
- Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.
- To get the expected features and performance for the chat versions, a specific formatting needs to be followed,including the INST and <<SYS>> tags, BOS and EOS tokens, and the whitespaces and breaklines in between (we recommend calling strip() on inputs to avoid double-spaces). See our reference code in github for details: chat_completion.
- Just Install ctransformers:
```
!pip install ctransformers>=0.2.24
```
- Use the following to get started.
- ```
from ctransformers import AutoModelForCausalLM
#Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("Ranjanunicode/unicode-llama-2-chat-Hf-q4-2", model_file="unicode-llama-2-chat-Hf-q4-2.gguf", model_type="llama", gpu_layers=40)
print(llm("AI is going to"))
```
### Out-of-Scope Use
- Out-of-scope Uses Use in any manner that violates applicable laws or regulations (including trade compliance laws).Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2.
### Compute Infrastructure
- Google collab Tesla T4 Gpu.
[More Information Needed]
## Citation [optional]
- Meta
- Meta LLama
- "https://arxiv.org/abs/2310.19102"
## Model Card Authors [optional]
- Ranjan
## Model Card Contact
- "[email protected]" |