Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,49 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- HuggingFaceM4/the_cauldron
|
5 |
+
- AnyModal/flickr30k
|
6 |
+
- openbmb/RLAIF-V-Dataset
|
7 |
+
base_model:
|
8 |
+
- HuggingFaceTB/SmolLM2-135M-Instruct
|
9 |
+
- facebook/dino-vitb16
|
10 |
+
library_name: transformers
|
11 |
+
pipeline_tag: image-text-to-text
|
12 |
+
tags:
|
13 |
+
- vqa
|
14 |
+
- vlm
|
15 |
+
---
|
16 |
+
|
17 |
+
<p align="center">
|
18 |
+
<img src="https://github.com/mkturkcan/femtovlm/blob/main/assets/logo.png?raw=true" width="180" />
|
19 |
+
</p>
|
20 |
+
<h1 align="center">
|
21 |
+
<p>mehmetkeremturkcan/FemtoVLM-DINO</p>
|
22 |
+
</h1>
|
23 |
+
<h3 align="center">
|
24 |
+
<p>FemtoVLM: Tiniest Vision Language Models</p>
|
25 |
+
</h3>
|
26 |
+
|
27 |
+
FemtoVLM is the smallest visual question answering/captioning model in the world. It accepts image and text inputs to produce text outputs. It's designed for efficiency. FemtoVLM can answer questions about images and describe visual content. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance.
|
28 |
+
|
29 |
+
FemtoVLM comes in three sizes: 116M (femto), 143M (tiny), 160M (base), 225M (dino). All models are trained for image captioning and question answering in real-world contexts. FemtoVLM cannot perform optical character recognition (OCR), multi-turn question-answering, or scientific question answering.
|
30 |
+
## Setup
|
31 |
+
```bash
|
32 |
+
pip install git+https://github.com/facebookresearch/schedule_free.git
|
33 |
+
pip install peft
|
34 |
+
git clone https://github.com/mkturkcan/seers.git
|
35 |
+
cd seers/seers/
|
36 |
+
git clone https://huggingface.co/mehmetkeremturkcan/FemtoVLM-DINO
|
37 |
+
```
|
38 |
+
## Test
|
39 |
+
Run, in the seers/seers folder,
|
40 |
+
```bash
|
41 |
+
python femtovlm_inference.py
|
42 |
+
```
|
43 |
+
|
44 |
+
## Train
|
45 |
+
|
46 |
+
[seers](https://github.com/mkturkcan/seers) training code is public! Run
|
47 |
+
```bash
|
48 |
+
python femtovlm_train.py
|
49 |
+
```
|