SRDdev commited on
Commit
a93ebbd
Β·
verified Β·
1 Parent(s): 5abf501

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -56
README.md CHANGED
@@ -5,59 +5,3 @@ pipeline_tag: image-to-text
5
  tags:
6
  - image-captioning
7
  ---
8
- # FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
9
-
10
- A framework designed to generate semantically rich image captions.
11
-
12
- ## Resources
13
-
14
- - πŸ’» **Project Page**: For more details, visit the official [project page](https://rotsteinnoam.github.io/FuseCap/).
15
-
16
- - πŸ“ **Read the Paper**: You can find the paper [here](https://arxiv.org/abs/2305.17718).
17
-
18
- - πŸš€ **Demo**: Try out our BLIP-based model [demo](https://huggingface.co/spaces/noamrot/FuseCap) trained using FuseCap.
19
-
20
- - πŸ“‚ **Code Repository**: The code for FuseCap can be found in the [GitHub repository](https://github.com/RotsteinNoam/FuseCap).
21
-
22
- - πŸ—ƒοΈ **Datasets**: The fused captions datasets can be accessed from [here](https://github.com/RotsteinNoam/FuseCap#datasets).
23
-
24
- #### Running the model
25
-
26
- Our BLIP-based model can be run using the following code,
27
-
28
- ```python
29
- import requests
30
- from PIL import Image
31
- from transformers import BlipProcessor, BlipForConditionalGeneration
32
- import torch
33
-
34
- device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
35
- processor = BlipProcessor.from_pretrained("noamrot/FuseCap")
36
- model = BlipForConditionalGeneration.from_pretrained("noamrot/FuseCap").to(device)
37
-
38
- img_url = 'https://huggingface.co/spaces/noamrot/FuseCap/resolve/main/bike.jpg'
39
- raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
40
-
41
- text = "a picture of "
42
- inputs = processor(raw_image, text, return_tensors="pt").to(device)
43
-
44
- out = model.generate(**inputs, num_beams = 3)
45
- print(processor.decode(out[0], skip_special_tokens=True))
46
- ```
47
-
48
- ## Upcoming Updates
49
-
50
- The official codebase, datasets and trained models for this project will be released soon.
51
-
52
- ## BibTeX
53
-
54
- ``` Citation
55
- @article{rotstein2023fusecap,
56
- title={FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions},
57
- author={Noam Rotstein and David Bensaid and Shaked Brody and Roy Ganz and Ron Kimmel},
58
- year={2023},
59
- eprint={2305.17718},
60
- archivePrefix={arXiv},
61
- primaryClass={cs.CV}
62
- }
63
- ```
 
5
  tags:
6
  - image-captioning
7
  ---