TeLVE: Turkish efficient Language Vision Engine 🧿

License: CC BY 4.0 Models: v1.0

First Turkish VLM ever!

TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing. No module named 'imagine' TeLVE logo

Model Description

TeLVE combines:

  • 🖼️ Vision Transformer (ViT-base-patch16-224)
  • 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
  • 🔄 Cross-attention mechanism for vision-language fusion

Version Logs

  • TeLVE v1.0: Trained on Unsplash Lite dataset
  • TeLVE v1.0dep: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. (Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)

Usage

The model can be used in two ways:

Inference (imagine.py)

# Generate captions for images
python imagine.py

This script:

  • Loads a trained TeLVE model
  • Takes images from images directory
  • Generates Turkish captions for each image
  • Outputs the results to console

Training (main.py)

Users can train their own models with ViT and BERT encoders.

# Train a new model
python main.py

This script:

  • Loads and preprocesses image-caption pairs
  • Initializes ViT and BERT encoders
  • Trains the combined model
  • Saves the model and tokenizer

Performance

Performance scores will be evaluated.

Citation

@software{telve2024,
    author = {Öğüt Su Karagün},
    title = {TeLVE: Turkish efficient Language Vision Engine},
    year = {2024},
    url = {https://huggingface.co/outsu/TeLVE}
}

License

TeLVE © 2024 by Öğüt Su Karagün is licensed under Creative Commons Attribution 4.0 International

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Space using outsu/TeLVE 1