# Llama 3.2 11B-Vision-Instruct Model on Hugging Face This repository hosts the `Llama 3.2 11B-Vision-Instruct` model, fine-tuned for generating TikZ code from captions and images, suitable for enhancing scientific visualizations. ## Model Description The `Llama 3.2 11B-Vision-Instruct` is a multimodal model combining the robust textual understanding and generative capabilities of LLaMA 3.2 with a specialized vision encoder, aimed at integrating detailed visual embeddings with textual data for high-quality output. ## Installation Ensure you have PyTorch and Transformers installed in your environment. If not, you can install them using pip: ```bash pip install torch transformers ``` ```bash import torch from datetime import date from PIL import Image, ImageTk from transformers import MllamaForConditionalGeneration, AutoProcessor import tkinter as tk from tkinter import filedialog, ttk, messagebox import logging import json import os from peft import LoraConfig, get_peft_model # Configure logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') # Get today's date date_string: str = date.today().strftime("%d %b %Y") model_id = "oe2015/llama-3.2-finetuned-tikzcode" lora_config = LoraConfig( r=16, # Rank of the decomposition, typically a small number (e.g., 8, 16) lora_alpha=32, # Scaling factor for LoRA parameters target_modules=["q_proj", "v_proj"], # Apply LoRA to specific model layers (e.g., Q, V projections in attention layers) lora_dropout=0.1, # Dropout for LoRA layers bias="none" # LoRA doesn’t update biases by default; change to "all" if needed ) # Load the model and processor model = MllamaForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) model = get_peft_model(model, lora_config) processor = AutoProcessor.from_pretrained(model_id) ```