howanching-clara
commited on
Commit
•
a0c74e3
1
Parent(s):
14af5bf
Update README.md
Browse files
README.md
CHANGED
@@ -31,8 +31,7 @@ It achieves the following results on the evaluation set:
|
|
31 |
## Model description
|
32 |
|
33 |
The model is fine-tuned with academic publications in Linguistics, to classify texts in publications into 4 classes as a filter to other tasks.
|
34 |
-
|
35 |
-
The 4 classes:
|
36 |
- 0: out of scope - materials that are of low significance, eg. page number and page header, noise from OCR/pdf-to-text convertion
|
37 |
- 1: main text - texts that are the main texts of the publication, to be used for down-stream tasks
|
38 |
- 2: examples - texts that are captions of the figures, or quotes or excerpts
|
|
|
31 |
## Model description
|
32 |
|
33 |
The model is fine-tuned with academic publications in Linguistics, to classify texts in publications into 4 classes as a filter to other tasks.
|
34 |
+
Sentence-based data obtained from OCR-processed PDF files was annotated manually with the following classes:
|
|
|
35 |
- 0: out of scope - materials that are of low significance, eg. page number and page header, noise from OCR/pdf-to-text convertion
|
36 |
- 1: main text - texts that are the main texts of the publication, to be used for down-stream tasks
|
37 |
- 2: examples - texts that are captions of the figures, or quotes or excerpts
|