Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,9 @@ license: mit
|
|
9 |
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
|
12 |
-
ViPE: Visualize Pretty-much Everything, is the first automated model for translating any arbitrary piece of text into a visualizable prompt.
|
|
|
|
|
13 |
|
14 |
### Model Description
|
15 |
|
@@ -101,18 +103,28 @@ However, a semicolon draws a stronger boundary between the keywords and encourag
|
|
101 |
### Training Data
|
102 |
|
103 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
### Training Procedure
|
108 |
|
|
|
|
|
|
|
109 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
110 |
|
111 |
-
|
112 |
## Evaluation
|
113 |
-
|
114 |
-
|
115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
|
117 |
## Citation
|
118 |
|
|
|
9 |
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
|
12 |
+
ViPE: Visualize Pretty-much Everything, is the first automated model for translating any arbitrary piece of text into a visualizable prompt.
|
13 |
+
It helps any text-to-image model in figurative or non-lexical language visualizations. It has been shown to be more robust than GPT3.5 Turbo (ChatGPT)
|
14 |
+
in generating depictable and semantically meaningful prompts.
|
15 |
|
16 |
### Model Description
|
17 |
|
|
|
103 |
### Training Data
|
104 |
|
105 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
106 |
+
- LyricCanvas dataset: a synthetically generated dataset: will be published soon
|
107 |
+
|
|
|
108 |
### Training Procedure
|
109 |
|
110 |
+
ViPE has been trained in the standard auto-regressive procedure: given a line (or lines) of lyrics as a prefix, the objective is to generate a plausible
|
111 |
+
prompt that is both despicable and semantically related to the given lyric(c). The loss function does not include the tokens corresponding to the lyrics. So ViPE
|
112 |
+
never generates any original lyrics and only learns to generate visually related prompts.
|
113 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
114 |
|
|
|
115 |
## Evaluation
|
116 |
+
In all of the following evaluations, ViPE consistently demonstrates its robustness compared to ChatGPT and achieves performance that is competitive with that of human experts.
|
117 |
+
|
118 |
+
- ***Intrinsic evaluations***
|
119 |
+
- General understanding of figurative language using [Fig-QA dataset](https://huggingface.co/datasets/nightingal3/fig-qa)
|
120 |
+
- ***Extrinsic evaluations***
|
121 |
+
- Image-text Retrieval on the [HAIVMet dataset](https://aclanthology.org/2023.findings-acl.465.pdf)
|
122 |
+
- Emotion visualizations: How well does ViPE transfer emotionally charged tweets into a depictable description of a scene in comparison with
|
123 |
+
ChatGPT. The [Emotion dataset](https://huggingface.co/datasets/dair-ai/emotion) is utilized.
|
124 |
+
- ***Human evaluations***
|
125 |
+
- we conducted a user study involving 30 native English-speaking participants aged between 20 and 40. Participants were
|
126 |
+
presented with 3 images and a metaphor from the HAIVMet dataset. They were asked to select the images that matches the metaphor the best.
|
127 |
+
The images were generated using prompts from ViPE, ChatGPT, and human experts (HAIVMet).
|
128 |
|
129 |
## Citation
|
130 |
|