Update names
Browse files- README.md +3 -3
- README_ja.md +1 -1
README.md
CHANGED
@@ -7,14 +7,12 @@ pipeline_tag: text-classification
|
|
7 |
library_name: fasttext
|
8 |
---
|
9 |
|
10 |
-
# Swallow
|
11 |
|
12 |
[日本語版の README はこちら](https://huggingface.co/tokyotech-llm/edu-classifier/blob/main/README_ja.md)
|
13 |
|
14 |
## Model summary
|
15 |
|
16 |
-
**NOTE**: This classifier is designed to work only with **Japanese** text. Quality for English or other languages is not guaranteed.
|
17 |
-
|
18 |
This repository contains fastText classifiers for judging the educational value of Japanese web pages. It includes two types of classifiers:
|
19 |
|
20 |
1. **Wiki-based classifier**: trained on Japanese Wikipedia text in academic categories.
|
@@ -24,6 +22,8 @@ The Wiki-based classifier is distributed under the [CC BY-SA 4.0](https://huggin
|
|
24 |
|
25 |
These classifiers were employed for quality-filtering process in the Swallow Corpus Version 2\*, which was used to train the [Llama 3.1 Swallow](https://huggingface.co/collections/tokyotech-llm/llama-31-swallow-66fd4f7da32705cadd1d5bc6) series. Our experiments demonstrated that applying filtering based on the classifier’s scores enabled more effective improvements in the LLM’s Japanese knowledge, even with the same computational resources.
|
26 |
|
|
|
|
|
27 |
\* A large Japanese web corpus extracted from Common Crawl
|
28 |
|
29 |
### How to use
|
|
|
7 |
library_name: fasttext
|
8 |
---
|
9 |
|
10 |
+
# Swallow Education Classifier
|
11 |
|
12 |
[日本語版の README はこちら](https://huggingface.co/tokyotech-llm/edu-classifier/blob/main/README_ja.md)
|
13 |
|
14 |
## Model summary
|
15 |
|
|
|
|
|
16 |
This repository contains fastText classifiers for judging the educational value of Japanese web pages. It includes two types of classifiers:
|
17 |
|
18 |
1. **Wiki-based classifier**: trained on Japanese Wikipedia text in academic categories.
|
|
|
22 |
|
23 |
These classifiers were employed for quality-filtering process in the Swallow Corpus Version 2\*, which was used to train the [Llama 3.1 Swallow](https://huggingface.co/collections/tokyotech-llm/llama-31-swallow-66fd4f7da32705cadd1d5bc6) series. Our experiments demonstrated that applying filtering based on the classifier’s scores enabled more effective improvements in the LLM’s Japanese knowledge, even with the same computational resources.
|
24 |
|
25 |
+
**NOTE**: This classifier is designed to work with Japanese text. Its functionality and quality are not guaranteed for non-Japanese languages, including English.
|
26 |
+
|
27 |
\* A large Japanese web corpus extracted from Common Crawl
|
28 |
|
29 |
### How to use
|
README_ja.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
# Swallow
|
2 |
|
3 |
## 概要
|
4 |
|
|
|
1 |
+
# Swallow Education Classifier
|
2 |
|
3 |
## 概要
|
4 |
|