Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ base_model:
|
|
4 |
---
|
5 |
|
6 |
|
7 |
-
# MISHANM/
|
8 |
|
9 |
The MISHANM/ibm-granite-granite-vision-3.2-2b-fp16 model is a sophisticated vision-language model designed for image-to-text generation. It leverages advanced neural architectures to transform visual inputs into coherent textual descriptions.
|
10 |
|
@@ -41,7 +41,7 @@ from PIL import Image
|
|
41 |
|
42 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
43 |
|
44 |
-
model_path = "MISHANM/ibm-granite-
|
45 |
processor = AutoProcessor.from_pretrained(model_path)
|
46 |
model = AutoModelForVision2Seq.from_pretrained(model_path, ignore_mismatched_sizes=True).to(device)
|
47 |
|
@@ -113,7 +113,7 @@ Users are encouraged to critically evaluate the model's outputs, especially in s
|
|
113 |
|
114 |
## Citation Information
|
115 |
```
|
116 |
-
@misc{MISHANM/ibm-granite-
|
117 |
author = {Mishan Maurya},
|
118 |
title = {Introducing Image to Text Generation model},
|
119 |
year = {2025},
|
|
|
4 |
---
|
5 |
|
6 |
|
7 |
+
# MISHANM/ibm-granite-vision-3.2-2b-fp16
|
8 |
|
9 |
The MISHANM/ibm-granite-granite-vision-3.2-2b-fp16 model is a sophisticated vision-language model designed for image-to-text generation. It leverages advanced neural architectures to transform visual inputs into coherent textual descriptions.
|
10 |
|
|
|
41 |
|
42 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
43 |
|
44 |
+
model_path = "MISHANM/ibm-granite-vision-3.2-2b-fp16"
|
45 |
processor = AutoProcessor.from_pretrained(model_path)
|
46 |
model = AutoModelForVision2Seq.from_pretrained(model_path, ignore_mismatched_sizes=True).to(device)
|
47 |
|
|
|
113 |
|
114 |
## Citation Information
|
115 |
```
|
116 |
+
@misc{MISHANM/ibm-granite-vision-3.2-2b-fp16,
|
117 |
author = {Mishan Maurya},
|
118 |
title = {Introducing Image to Text Generation model},
|
119 |
year = {2025},
|