Spaces:
Runtime error
Runtime error
Commit
·
fe75ac3
1
Parent(s):
ea304d0
Update abstract
Browse files- sections/abstract.md +3 -1
sections/abstract.md
CHANGED
@@ -1,4 +1,6 @@
|
|
1 |
## Abstract
|
2 |
This project is focused on Mutilingual Visual Question Answering. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to provide a Proof-of-Concept with our simple CLIP Vision + BERT model which can be trained on multilingual text checkpoints with pre-trained image encoders and made to perform well enough.
|
3 |
|
4 |
-
Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (already in English), French, German and Spanish using the mBART-50 models.
|
|
|
|
|
|
1 |
## Abstract
|
2 |
This project is focused on Mutilingual Visual Question Answering. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to provide a Proof-of-Concept with our simple CLIP Vision + BERT model which can be trained on multilingual text checkpoints with pre-trained image encoders and made to perform well enough.
|
3 |
|
4 |
+
Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (already in English), French, German and Spanish using the mBART-50 models. We get an eval accuracy of 0.69 on the MLM task.
|
5 |
+
|
6 |
+
We achieved 0.49 accuracy on the multilingual validation set of VQAv2 we created using Marian models. With better captions, and hyperparameter-tuning, we expect to see higher performance.
|