ruslanmv's picture
Update README.md
58fd905 verified
metadata
language: en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - ruslanmv
  - llama
  - trl
base_model: meta-llama/Meta-Llama-3-8B
datasets:
  - ruslanmv/ai-medical-chatbot

Medical-Llama3-8B-GPTQ

This is a fine-tuned version of the Llama3 8B model, specifically designed to answer medical questions. The model was trained on the AI Medical Chatbot dataset, which can be found at ruslanmv/ai-medical-chatbot. This fine-tuned model leverages technique GPTQ for efficient inference with 4-bit quantization. GPTQ is a technique for compressing deep learning model weights through a 4-bit quantization process that targets efficient GPU inference. This approach aims to reduce model size by converting weights to a 4-bit representation while controlling error. For better performance during inference, GPTQ dynamically restores the weights to float16, balancing the benefits of reduced memory usage with computational efficiency.

Model: ruslanmv/Medical-Llama3-8B-GPTQ

  • Developed by: ruslanmv
  • License: apache-2.0
  • Finetuned from model: meta-llama/Meta-Llama-3-8B

Installation

Prerequisites:

  • A system with CUDA support is highly recommended for optimal performance.
  • Python 3.10 or later

Installation Steps:

  1. Install required Python libraries:

    pip install transformers==4.40.0
    

Usage

Here's an example of how to use the Medical-Llama3-8B-GPTQ model to generate an answer to a medical question:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
device = "cuda:0" if torch.cuda.is_available() else "cpu"
repo_id = "ruslanmv/Medical-Llama3-8B-GPTQ"


# download quantized model from Hugging Face Hub and load to the first GPU
model = AutoGPTQForCausalLM.from_quantized(repo_id, 
                                          device=device, 
                                           use_safetensors=True, 
                                           use_triton=False)
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

def create_prompt(user_query):
  B_INST, E_INST = "<s>[INST]", "[/INST]"
  B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
  DEFAULT_SYSTEM_PROMPT = """\
  You are an AI Medical Chatbot Assistant, I aim to provide comprehensive and informative responses to your inquiries. However, please note that while I strive for accuracy, my responses should not replace professional medical advice and short answers.
  If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""
  SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
  instruction = f"User asks: {user_query}\n"
  prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
  return prompt.strip()

def generate_text(model, tokenizer, prompt,
                  max_length=200,
                  temperature=0.7,
                  num_return_sequences=1):

    prompt = create_prompt(user_query)
    # Tokenize the prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)  # Move input_ids to the same device as the model
    # Generate text
    output = model.generate(
        input_ids=input_ids,
        max_length=max_length,
        temperature=temperature,
        num_return_sequences=num_return_sequences,
        pad_token_id=tokenizer.eos_token_id,  # Set pad token to end of sequence token
        do_sample=True
    )    
    # Decode the generated output
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
  
   # Split the generated text based on the prompt and take the portion after it
    generated_text = generated_text.split(prompt)[-1].strip()

    return generated_text

Inference Example

This section showcases how to use the model for inference.

User Query:

user_query = "I'm a 35-year-old male experiencing symptoms like fatigue, increased sensitivity to cold, and dry, itchy skin. Could these be indicative of hypothyroidism?"

Answer:

generated_text = generate_text(model, tokenizer, user_query)    
print(generated_text)

You will get

I understand your concern. It could be attributed to hypothyroidism. You may also have perifollicular inflammation. I suggest you to get your thyroid profile done to rule out hypothyroidism. I would also suggest you to use a mild moisturizing cream, with sunscreen, to

License

This model is licensed under the Apache License 2.0. You can find the full license in the LICENSE file.