Kadins
/

Llama-3.2-Vision-chinese-lora

Image-Text-to-Text

Model card Files Files and versions Community

Kadins commited on Oct 22, 2024

Commit

04bbbb7

·

verified ·

1 Parent(s): bd56a10

Add model usage examples

Files changed (1) hide show

README.md +78 -3

README.md CHANGED Viewed

@@ -1,3 +1,78 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- zh
+base_model:
+- meta-llama/Llama-3.2-11B-Vision-Instruct
+tags:
+- llama
+- lora
+- chinese
+- zh
+---
+# Llama-3.2-Vision-chinese-lora
+- base model: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
+## Use with transformers
+```python
+import torch
+from transformers import MllamaForConditionalGeneration, AutoProcessor
+from peft import PeftModel
+from PIL import Image
+# Base model ID and LoRA model ID
+base_model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
+lora_model_id = "Kadins/Llama-3.2-Vision-chinese-lora"
+# Load the processor
+processor = AutoProcessor.from_pretrained(base_model_id)
+# Load the base model
+base_model = MllamaForConditionalGeneration.from_pretrained(
+    base_model_id,
+    device_map="auto",
+    torch_dtype=torch.float16  # Use torch.bfloat16 if your hardware supports it
+).eval()
+# Load the LoRA model and apply it to the base model
+model = PeftModel.from_pretrained(base_model, lora_model_id)
+# Optionally, merge the LoRA weights with the base model for faster inference
+model = model.merge_and_unload()
+# Load an example image (replace 'path_to_image.jpg' with your image file)
+image_path = 'path_to_image.jpg'
+image = Image.open(image_path)
+# User prompt in Chinese
+user_prompt = "请描述这张图片。"
+# Prepare the content with the image and text
+content = [
+    {"type": "image", "image": image},
+    {"type": "text", "text": user_prompt}
+]
+# Apply the chat template to create the prompt
+prompt = processor.apply_chat_template(
+    [{"role": "user", "content": content}],
+    add_generation_prompt=True
+)
+# Prepare the inputs for the model
+inputs = processor(
+    images=image,
+    text=prompt,
+    return_tensors="pt"
+).to(model.device)
+# Generate the model's response
+output = model.generate(**inputs, max_new_tokens=512)
+# Decode the output to get the assistant's response
+response = processor.decode(output[0], skip_special_tokens=True)
+# Print the assistant's response
+print("Assistant:", response)
+```