File size: 2,250 Bytes
dad0479 be39dc4 d8c4b93 be39dc4 dad0479 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
language:
- en
library_name: transformers
pipeline_tag: question-answering
tags:
- Finetuning
---
# Model Card for vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF
This model is a fine-tuned version of Llama-2-Chat-7b on company-specific question-answers data. It is designed for efficient performance while maintaining high-quality output, suitable for conversational AI applications.
## Model Details
It was finetuned using QLORA and PEFT. After fine-tuning, the adapters were merged with the base model and then quantized to GGUF.
- **Developed by:** Vishan Oberoi and Dev Chandan.
- **Model type:** Transformer-based Large Language Model
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
### Model Sources
- **Repository:** [vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF](https://huggingface.co/vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF)
- **Links:**
- LLaMA: [LLaMA Paper](https://arxiv.org/abs/2302.13971)
- QLORA: [QLORA Paper](https://arxiv.org/abs/2305.14314)
- llama.cpp: [llama.cpp Paper/Documentation](https://github.com/ggerganov/llama.cpp)
## Uses
This model is optimized for direct use in conversational AI, particularly for generating responses based on company-specific data. It can be utilized effectively in customer service bots, FAQ bots, and other applications where accurate and contextually relevant answers are required.
## Usage notebook
https://colab.research.google.com/drive/1885wYoXeRjVjJzHqL9YXJr5ZjUQOSI-w?authuser=4#scrollTo=TZIoajzYYkrg
#### Example with `ctransformers`:
```python
from ctransformers import AutoModelForCausalLM, AutoTokenizer
llm = AutoModelForCausalLM.from_pretrained("vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF", model_file="finetuned.gguf", model_type="llama", gpu_layers = 50, max_new_tokens = 2000, temperature = 0.2, top_k = 40, top_p = 0.6, context_length = 6000)
system_prompt = '''<<SYS>>
You are a useful bot
<</SYS>>
'''
user_prompt = "Tell me about your company"
# Combine system prompt with user prompt
full_prompt = f"{system_prompt}\n[INST]{user_prompt}[/INST]"
# Generate the response
response = llm(full_prompt)
# Print the response
print(response) |