Model Card for vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF

This model is a fine-tuned version of Llama-2-Chat-7b on company-specific question-answers data. It is designed for efficient performance while maintaining high-quality output, suitable for conversational AI applications.

Full Tutorial on Cheap Finetuning

https://github.com/VishanOberoi/FineTuningForTheGPUPoor?tab=readme-ov-file

Model Details

It was finetuned using QLORA and PEFT. After fine-tuning, the adapters were merged with the base model and then quantized to GGUF.

Model Sources

Uses

This model is optimized for direct use in conversational AI, particularly for generating responses based on company-specific data. It can be utilized effectively in customer service bots, FAQ bots, and other applications where accurate and contextually relevant answers are required.

Example with ctransformers:

from ctransformers import AutoModelForCausalLM, AutoTokenizer

llm = AutoModelForCausalLM.from_pretrained("vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF", model_file="finetuned.gguf", model_type="llama", gpu_layers = 50, max_new_tokens = 2000, temperature = 0.2, top_k = 40, top_p = 0.6, context_length = 6000)

system_prompt = "<<SYS>>You are a useful bot... <</SYS>>"

user_prompt = "Tell me about your company"

Combine system prompt with user prompt

full_prompt = f"{system_prompt}\n[INST]{user_prompt}[/INST]"

Generate the response

response = llm(full_prompt)

Print the response

print(response)
Downloads last month
27
GGUF
Model size
6.74B params
Architecture
llama
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.