Ramon Meffert commited on
Commit
9889a50
1 Parent(s): 1f08ed2

Update readme

Browse files
Files changed (2) hide show
  1. README.md +60 -76
  2. README.old.md +93 -0
README.md CHANGED
@@ -1,93 +1,77 @@
1
- # nlp-flashcard-project
2
 
3
- ## Todo 2
4
 
5
- - [ ] Contexts preprocessing
6
- - [ ] Formules enzo eruit filteren
7
- - [ ] Splitsen op zinnen...?
8
- - [ ] Meer language models proberen
9
- - [ ] Elasticsearch
10
- - [ ] CLI voor vragen beantwoorden
11
 
12
- ### Extra dingen
 
13
 
14
- - [ ] Huggingface spaces demo
15
- - [ ] Question generation voor finetuning
16
- - [ ] Language model finetunen
17
 
18
- ## Todo voor progress meeting
 
 
 
 
 
 
 
 
 
 
19
 
20
- - [ ] Data inlezen/Repo klaarmaken
21
- - [ ] Proof of concept met UnifiedQA
22
- - [ ] Standaard QA model met de dataset
23
- - [ ] Papers verzamelen/lezen
24
- - [ ] Eerder werk bekijken, inspiratie opdoen voor research richting
25
 
26
- ## Overview
 
 
27
 
28
- De meeste QA systemen bestaan uit twee onderdelen:
29
 
30
- - Een retriever. Die haalt adhv de vraag _k_ relevante stukken context op, bv.
31
- met `tf-idf`.
32
- - Een model dat het antwoord genereert. Wat je hier precies gebruikt hangt af
33
- van de manier van question answering:
34
- - Voor **extractive QA** gebruik je een reader;
35
- - Voor **generative QA** gebruik je een generator.
36
 
37
- Beide werken op basis van een language model.
38
 
39
- ## Handige info
40
 
41
- - Huggingface QA tutorial: <https://huggingface.co/docs/transformers/tasks/question_answering#finetune-with-tensorflow>
42
- - Overview van open-domain question answering technieken: <https://lilianweng.github.io/posts/2020-10-29-odqa/>
43
 
44
- ## Base model
 
 
 
 
45
 
46
- Tot nu toe alleen een retriever die adhv een vraag de top-k relevante documents
47
- ophaalt. Haalt voor veel vragen wel hoge similarity scores, maar de documents
48
- die die ophaalt zijn meestal niet erg relevant.
49
 
50
- ```bash
51
- poetry shell
52
- cd base_model
53
- poetry run python main.py
54
  ```
55
 
56
- ### Voorbeeld
57
-
58
- "What is the perplexity of a language model?"
59
-
60
- > Result 1 (score: 74.10):
61
- > Figure 10 .17 A sample alignment between sentences in English and French, with
62
- > sentences extracted from Antoine de Saint-Exupery's Le Petit Prince and a
63
- > hypothetical translation. Sentence alignment takes sentences e 1 , ..., e n ,
64
- > and f 1 , ..., f n and finds minimal > sets of sentences that are translations
65
- > of each other, including single sentence mappings like (e 1 ,f 1 ), (e 4 -f 3
66
- > ), (e 5 -f 4 ), (e 6 -f 6 ) as well as 2-1 alignments (e 2 /e 3 ,f 2 ), (e 7
67
- > /e 8 -f 7 ), and null alignments (f 5 ).
68
- >
69
- > Result 2 (score: 74.23):
70
- > Character or word overlap-based metrics like chrF (or BLEU, or etc.) are
71
- > mainly used to compare two systems, with the goal of answering questions like:
72
- > did the new algorithm we just invented improve our MT system? To know if the
73
- > difference between the chrF scores of two > MT systems is a significant
74
- > difference, we use the paired bootstrap test, or the similar randomization
75
- > test.
76
- >
77
- > Result 3 (score: 74.43):
78
- > The model thus predicts the class negative for the test sentence.
79
- >
80
- > Result 4 (score: 74.95):
81
- > Translating from languages with extensive pro-drop, like Chinese or Japanese,
82
- > to non-pro-drop languages like English can be difficult since the model must
83
- > somehow identify each zero and recover who or what is being talked about in
84
- > order to insert the proper pronoun.
85
- >
86
- > Result 5 (score: 76.22):
87
- > Similarly, a recent challenge set, the WinoMT dataset (Stanovsky et al., 2019)
88
- > shows that MT systems perform worse when they are asked to translate sentences
89
- > that describe people with non-stereotypical gender roles, like "The doctor
90
- > asked the nurse to help her in the > operation".
91
-
92
-
93
- ## Setting up elastic search.
 
1
+ # NLP FlashCards
2
 
3
+ ## Dependencies
4
 
5
+ Make sure you have the following tools installed:
 
 
 
 
 
6
 
7
+ - [Poetry](https://python-poetry.org/) for Python package management;
8
+ - [Docker](https://www.docker.com/get-started/) for running ElasticSearch.
9
 
10
+ Then, run the following commands:
 
 
11
 
12
+ ```sh
13
+ poetry install
14
+ docker pull docker.elastic.co/elasticsearch/elasticsearch:8.1.1
15
+ docker network create elastic
16
+ docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.1.1
17
+ ```
18
+
19
+ After the last command, a password for the `elastic` user should show up in the
20
+ terminal output (you might have to scroll up a bit). Copy this password, and
21
+ create a copy of the `.env.example` file and rename it to `.env`. Replace the
22
+ `<password>` placeholder with your copied password.
23
 
24
+ Next, run the following command **from the root of the repository**:
 
 
 
 
25
 
26
+ ```sh
27
+ docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .
28
+ ```
29
 
30
+ ## Running
31
 
32
+ To make sure we're using the dependencies managed by Poetry, run `poetry shell`
33
+ before executing any of the following commands. Alternatively, replace any call
34
+ like `python file.py` with `poetry run python file.py` (but we suggest the shell
35
+ option, since it is much more convenient).
 
 
36
 
37
+ ### Training
38
 
39
+ N/A for now
40
 
41
+ ### Using the QA system
 
42
 
43
+ ⚠️ **Important** ⚠️ _If you want to run an ElasticSearch query, make sure the
44
+ docker container is running! You can check this by running `docker container
45
+ ls`. If your container shows up (it's named `es01` if you followed these
46
+ instructions), it's running. If not, you can run `docker start es01` to start
47
+ it, or start it from Docker Desktop._
48
 
49
+ To query the QA system, run any query as follows:
 
 
50
 
51
+ ```sh
52
+ python query.py "Why can dot product be used as a similarity metric?"
 
 
53
  ```
54
 
55
+ By default, the best answer along with its location in the book will be
56
+ returned. If you want to generate more answers (say, a top-5), you can supply
57
+ the `--top=5` option. The default retriever uses [FAISS](https://faiss.ai/), but
58
+ you can also use [ElasticSearch](https://www.elastic.co/elastic-stack/) using
59
+ the `--retriever=es` option.
60
+
61
+ ### CLI overview
62
+
63
+ To get an overview of all available options, run `python query.py --help`. The
64
+ options are also printed below.
65
+
66
+ ```sh
67
+ usage: query.py [-h] [--top int] [--retriever {faiss,es}] str
68
+
69
+ positional arguments:
70
+ str The question to feed to the QA system
71
+
72
+ options:
73
+ -h, --help show this help message and exit
74
+ --top int, -t int The number of answers to retrieve
75
+ --retriever {faiss,es}, -r {faiss,es}
76
+ The retrieval method to use
77
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.old.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # nlp-flashcard-project
2
+
3
+ ## Todo 2
4
+
5
+ - [ ] Contexts preprocessing
6
+ - [ ] Formules enzo eruit filteren
7
+ - [ ] Splitsen op zinnen...?
8
+ - [ ] Meer language models proberen
9
+ - [ ] Elasticsearch
10
+ - [ ] CLI voor vragen beantwoorden
11
+
12
+ ### Extra dingen
13
+
14
+ - [ ] Huggingface spaces demo
15
+ - [ ] Question generation voor finetuning
16
+ - [ ] Language model finetunen
17
+
18
+ ## Todo voor progress meeting
19
+
20
+ - [ ] Data inlezen/Repo klaarmaken
21
+ - [ ] Proof of concept met UnifiedQA
22
+ - [ ] Standaard QA model met de dataset
23
+ - [ ] Papers verzamelen/lezen
24
+ - [ ] Eerder werk bekijken, inspiratie opdoen voor research richting
25
+
26
+ ## Overview
27
+
28
+ De meeste QA systemen bestaan uit twee onderdelen:
29
+
30
+ - Een retriever. Die haalt adhv de vraag _k_ relevante stukken context op, bv.
31
+ met `tf-idf`.
32
+ - Een model dat het antwoord genereert. Wat je hier precies gebruikt hangt af
33
+ van de manier van question answering:
34
+ - Voor **extractive QA** gebruik je een reader;
35
+ - Voor **generative QA** gebruik je een generator.
36
+
37
+ Beide werken op basis van een language model.
38
+
39
+ ## Handige info
40
+
41
+ - Huggingface QA tutorial: <https://huggingface.co/docs/transformers/tasks/question_answering#finetune-with-tensorflow>
42
+ - Overview van open-domain question answering technieken: <https://lilianweng.github.io/posts/2020-10-29-odqa/>
43
+
44
+ ## Base model
45
+
46
+ Tot nu toe alleen een retriever die adhv een vraag de top-k relevante documents
47
+ ophaalt. Haalt voor veel vragen wel hoge similarity scores, maar de documents
48
+ die die ophaalt zijn meestal niet erg relevant.
49
+
50
+ ```bash
51
+ poetry shell
52
+ cd base_model
53
+ poetry run python main.py
54
+ ```
55
+
56
+ ### Voorbeeld
57
+
58
+ "What is the perplexity of a language model?"
59
+
60
+ > Result 1 (score: 74.10):
61
+ > Figure 10 .17 A sample alignment between sentences in English and French, with
62
+ > sentences extracted from Antoine de Saint-Exupery's Le Petit Prince and a
63
+ > hypothetical translation. Sentence alignment takes sentences e 1 , ..., e n ,
64
+ > and f 1 , ..., f n and finds minimal > sets of sentences that are translations
65
+ > of each other, including single sentence mappings like (e 1 ,f 1 ), (e 4 -f 3
66
+ > ), (e 5 -f 4 ), (e 6 -f 6 ) as well as 2-1 alignments (e 2 /e 3 ,f 2 ), (e 7
67
+ > /e 8 -f 7 ), and null alignments (f 5 ).
68
+ >
69
+ > Result 2 (score: 74.23):
70
+ > Character or word overlap-based metrics like chrF (or BLEU, or etc.) are
71
+ > mainly used to compare two systems, with the goal of answering questions like:
72
+ > did the new algorithm we just invented improve our MT system? To know if the
73
+ > difference between the chrF scores of two > MT systems is a significant
74
+ > difference, we use the paired bootstrap test, or the similar randomization
75
+ > test.
76
+ >
77
+ > Result 3 (score: 74.43):
78
+ > The model thus predicts the class negative for the test sentence.
79
+ >
80
+ > Result 4 (score: 74.95):
81
+ > Translating from languages with extensive pro-drop, like Chinese or Japanese,
82
+ > to non-pro-drop languages like English can be difficult since the model must
83
+ > somehow identify each zero and recover who or what is being talked about in
84
+ > order to insert the proper pronoun.
85
+ >
86
+ > Result 5 (score: 76.22):
87
+ > Similarly, a recent challenge set, the WinoMT dataset (Stanovsky et al., 2019)
88
+ > shows that MT systems perform worse when they are asked to translate sentences
89
+ > that describe people with non-stereotypical gender roles, like "The doctor
90
+ > asked the nurse to help her in the > operation".
91
+
92
+
93
+ ## Setting up elastic search.