damerajee commited on
Commit
0c231a9
·
verified ·
1 Parent(s): aa438b7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - damerajee/Llava-pretrain-small
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ tags:
9
+ - Vision Language Model
10
+ ---
11
+ # GPT-Vision
12
+ A very small Vision-Lanaguge Model , Like Llava and Moondream This model has THREE components combined into one
13
+ * GPT2
14
+ * VIT-224
15
+ * Multimodality-projector
16
+
17
+
18
+ # Inference
19
+ ```python
20
+ from transformers import AutoModelForCausalLM
21
+ from PIL import Image
22
+
23
+ model = AutoModelForCausalLM.from_pretrained("damerajee/GPT-Vision", trust_remote_code=True)
24
+
25
+ image_path = "Your_image_path"
26
+ image = Image.open(image_path)
27
+ image = image.convert('RGB')
28
+
29
+ question = "Render a clear and concise summary of the photo."
30
+ answer = model.generate(image=image,question=question,max_new_tokens=40)
31
+ print("Answer:", answer)
32
+ ```
33
+
34
+ # Limitations
35
+ A fair warning tho guys , this model is only able to generate very short response sometimes it can also repetitive generate the same tokens but even thought it will understands whats on the image
36
+