|
--- |
|
license: mit |
|
language: |
|
- tr |
|
pipeline_tag: image-text-to-text |
|
tags: |
|
- Turkish |
|
- turkish |
|
- LLaVA |
|
datasets: |
|
- liuhaotian/LLaVA-CC3M-Pretrain-595K |
|
--- |
|
|
|
<img src="./CosmosLLaVA.png"/> |
|
|
|
# Llava-CosmosLlama |
|
|
|
This is a Turkish visual language model designed for multi-modal visual instruction-following tasks. It utilizes the LLaVA (Large Language and Vision Assistant) architecture, integrating the `ytucosmos/Turkish-Llama-8b-Instruct-v0.1` language model. The model is capable of processing both visual (image) and textual inputs, allowing it to understand and execute instructions provided in Turkish. |
|
|
|
# Model Details |
|
The model was pretrained on **[LLaVA-CC3M-Pretrain-595K](https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K)** dataset, which was translated to Turkish using DeepL Translate.<br> |
|
It was further fine-tuned using subsets the following datasets to enhance its visual reasoning and understanding capabilities: |
|
- **[Stanford GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html)** |
|
- **[VisualGenome](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html)** |
|
- **[COCO](https://cocodataset.org/#home)** |
|
- **110K multi-turn instruction following data** consisting of **book covers**, to enhance models capabilities on tasks regarding OCR. |
|
|
|
## Example Usage |
|
|
|
#### Using lmdeploy |
|
|
|
1. Install requirements: |
|
```sh |
|
conda create -n lmdeploy python=3.8 -y |
|
conda activate lmdeploy |
|
pip install lmdeploy |
|
``` |
|
|
|
2. Run the following code: |
|
|
|
```python |
|
from lmdeploy import pipeline, ChatTemplateConfig |
|
from lmdeploy.vl import load_image |
|
|
|
pipe = pipeline("ytu-ce-cosmos/Turkish-LLaVA-v0.1", |
|
chat_template_config=ChatTemplateConfig(model_name='llama3')) |
|
|
|
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-im-captioning.jpg" |
|
image = load_image(url) |
|
|
|
response = pipe(('Bu resimde öne çıkan ögeler nelerdir?', image)) |
|
|
|
print(response) |
|
|
|
""" |
|
Resimde, çiçeklerle dolu bir bahçede yavru bir köpek ve arka planda bir ağaç yer alıyor. |
|
Köpek, çiçeklerin arasında otururken ve etrafını saran çiçeklerin arasından bakarken görülebiliyor. |
|
Bu sahne, köpeğin bahçede geçirdiği zamanın tadını çıkardığı ve çevresini keşfettiği sakin ve huzurlu bir atmosferi yansıtıyor. |
|
""" |
|
``` |
|
|
|
Image used in this example: |
|
<img src="./example.jpg"/> |
|
|
|
# Acknowledgments |
|
- Computing resources used in this work were provided by the National Center for High Performance Computing of Turkey (UHeM). |
|
- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗 |
|
|
|
|
|
# Citation |
|
```bibtex |
|
Paper Coming Soon ... |
|
... |
|
``` |
|
|
|
### Contact |
|
COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department <br> |
|
https://cosmos.yildiz.edu.tr/ <br> |
|
[email protected] <br> |