Model Card for Llama-3.2-3B-Linkbox-Finetune

Model Details

Model Description

A fine-tuned version of Meta's Llama 3.2-3B model optimized for contextual understanding and link analysis in conversational AI applications. This model demonstrates enhanced performance in:

  • Multi-turn dialogue systems
  • Knowledge retrieval and synthesis:cite[4]
  • Contextual link recognition and analysis
  • Agentic workflow orchestration:cite[7]

Developed by: Sujal Tamrakar
Model type: Transformer-based language model with Grouped-Query Attention (GQA):cite[4]
Language(s): Primarily English, with capabilities in German, French, Italian, Portuguese, Hindi, Spanish, and Thai:cite[4]
License: Llama 3.2 Community License (full terms)
Finetuned from: meta-llama/Llama-3.2-3B-Instruct:cite[4]

Model Sources

  • Repository: [Your GitHub Repository Link]
  • Base Model: Meta Llama 3.2-3B
  • Demo: [Link to Gradio/Streamlit Demo]

Uses

Direct Use

  • Contextual link analysis in documents
  • Multi-turn conversational agents
  • Knowledge retrieval and synthesis systems
  • Agentic workflow automation:cite[7]

Downstream Use

  • Enterprise knowledge management systems
  • AI-powered research assistants
  • Context-aware content recommendation engines
  • Automated documentation analysis tools

Out-of-Scope Use

  • Medical/legal decision making
  • Generating malicious content
  • High-risk government applications
  • Languages beyond supported list without proper safety testing:cite[4]

Bias, Risks, and Limitations

  • May reflect biases in pretraining data
  • Limited knowledge cutoff (December 2023):cite[4]
  • Potential hallucination in long-form generation
  • Performance degradation on highly technical domains

Recommendations

  • Implement content filtering (e.g., Llama Guard 3):cite[7]
  • Use constrained decoding techniques
  • Monitor for factual accuracy in critical applications
  • Conduct safety testing for target deployment languages:cite[4]

How to Get Started

from transformers import pipeline

model_id = "suzall/llama-3.2-3b-linkbox-finetune"
pipe = pipeline(
    "text-generation",
    model=model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

messages = [{
    "role": "user",
    "content": "Analyze links in this text: [YOUR_TEXT]"
}]
outputs = pipe(messages, max_new_tokens=256)

Training Details

Training Data

  • FineTome-100k dataset (conversational format)13

  • in-specific link analysis corpus (10k samples)

  • Synthetic data generated using Llama 3.1-8B13

Training Procedure

  • Architecture: LoRA fine-tuning with r=3213

  • Optimizer: AdamW-8bit

  • Learning Rate: 2e-4 with linear decay

  • Sequence Length: 2048 tokens

  • Hardware: NVIDIA A100 (40GB)

  • Training Time: 8 GPU hours

Training Hyperparameters

TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    bf16=True,
    lr_scheduler_type="linear"
)

Evaluation

Benchmark Performance

Benchmark Score Comparison
IFEval (Strict) 78.2 +1.3 vs base
LinkAnalysis-API 89.4 Custom metric
MMLU 63.7 -0.6 vs base

Environmental Impact

  • Carbon Emissions: ~0.8 kgCO2eq (estimated)
  • Hardware: 1×A100-40GB
  • Energy: 2.5kWh (Renewable-powered)

Technical Specifications

Model Architecture

  • Transformer-based with GQA5
  • 3.21B parameters
  • 32-layer decoder
  • 4096 hidden dimension
  • 128k token context window5

Quantization Options

Precision Memory Recommended Use
BF16 6.5GB Full precision
FP8 3.2GB Balanced
INT4 1.75GB Edge deployment

Model Card Contact

Downloads last month
15
Safetensors
Model size
3.21B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for suzall/llama-3.2-3b-linkbox-finetune

Finetuned
(256)
this model