File size: 4,563 Bytes
f469ed3 65541cd 1b9defd f469ed3 1b9defd 2bf6841 1b9defd 2bf6841 1b9defd 5504b68 6bd3ab7 c35a712 2bf6841 c35a712 2bf6841 6bd3ab7 cd82a57 c7d4f38 cd82a57 6bd3ab7 c7d4f38 2bf6841 6bd3ab7 2bf6841 5504b68 2bf6841 5504b68 e52ab2b 2bf6841 5504b68 2bf6841 6bd3ab7 2bf6841 6bd3ab7 7aa0866 6bd3ab7 2bf6841 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
license: cc-by-nc-4.0
datasets:
- oeg/CelebA_Sent2Vect_Sp
language:
- es
tags:
- CelebA
- Spanish
- celebFaces Attributes
---
# Sent2vec trained with data from the descriptive text corpus of the CelebA dataset
## Overview
- **Language**: Spanish
- **Data**: [CelebA_Sent2vec_Sp](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp).
- **Architecture**: Sent2vec
- **Paper**: [Information Processing and Management](https://doi.org/10.1016/j.ipm.2024.103667)
## Description
Sent2vec can be used directly for English texts. For this purpose, all you have to do is download the library and enter the text to be coded, since most
of these algorithms were trained using English as the original language. However, since this work is used with text in Spanish, it has been necessary
to train it from zero in this new language. This training was carried out using the generated corpus ([in this respository](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp))
with the following process:
- A corpus composed of a set of descriptive sentences of characteristics of each of the faces of the CelebA dataset in Spanish has been generated.
A total of 192,209 sentences are available for training.
- Apply a pre-processing consisting of removing accents. _stopwords_ and connectors were retained as part of the sentence structure during training.
- Install the libraries _Sent2vec_ and _FastText_, and configure the parameters. The parameters have been fixed empirically after several
- tests, being: 4,800 dimensions of feature vectors, 5,000 epochs, 200 threads, 2 n-grams and a learning rate of 0.05.
In this context, the total training time lasted 7 hours working with all CPUs at maximum performance.
As a result, it generates a _bin_ extension file which can be downloaded from this repository.
## How to use
Download the model, as a result there is a **sent2vec_celebAEs-UNI.bin** file which will be loaded using the _sent2vec_ library in Python as follows:
```python
import sent2vec
Model_path="sent2vec_celebAEs-UNI.bin"
s2vmodel = sent2vec.Sent2vecModel()
s2vmodel.load_model(Model_path)
caption = """El hombre luce una sombra a las 5 en punto. Su cabello es de color negro. Tiene una nariz grande con cejas tupidas. El hombre se ve atractivo"""
vector = s2vmodel.embed_sentence(caption)
print(vector)
```
## Results
As a result, the encoder will generate a numeric vector whose dimension is 4800.
```python
>>$ print(vector)
>>$ [[0.1,0.87,0.51,........0.7]]
>>$ len(vector[0])
>>$ 4800
```
To see detailed information on the use of the trained model, enter the [following link](https://github.com/eduar03yauri/DCGAN-text2face-forSpanish/blob/main/Data/encoder-models/Sent2vec_model_trained.md)
## Licensing information
This model is available under the [CC BY-NC 4.0.](https://creativecommons.org/licenses/by-nc/4.0/deed.es)
## Citation information
**Citing**: If you used Sent2vec+CelebA model in your work, please cite the paper publish in **[Information Processing and Management](https://doi.org/10.1016/j.ipm.2024.103667)**:
```bib
@article{YAURILOZANO2024103667,
title = {Generative Adversarial Networks for text-to-face synthesis & generation: A quantitative–qualitative analysis of Natural Language Processing encoders for Spanish},
journal = {Information Processing & Management},
volume = {61},
number = {3},
pages = {103667},
year = {2024},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2024.103667},
url = {https://www.sciencedirect.com/science/article/pii/S030645732400027X},
author = {Eduardo Yauri-Lozano and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro}
}
```
## Autors
- [Eduardo Yauri Lozano](https://github.com/eduar03yauri)
- [Manuel Castillo-Cara](https://github.com/manwestc)
- [Raúl García-Castro](https://github.com/rgcmme)
[*Universidad Nacional de Ingeniería*](https://www.uni.edu.pe/), [*Ontology Engineering Group*](https://oeg.fi.upm.es/), [*Universidad Politécnica de Madrid.*](https://www.upm.es/internacional)
## Contributors
See the full list of contributors [here](https://github.com/eduar03yauri/DCGAN-text2face-forSpanish).
<kbd><img src="https://www.uni.edu.pe/images/logos/logo_uni_2016.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-oeg.png" alt="Ontology Engineering Group" width="100"></kbd>
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-upm.png" alt="Universidad Politécnica de Madrid" width="100"></kbd> |