aya-se commited on
Commit
e4d598b
·
1 Parent(s): d2048a3

Update names

Browse files
Files changed (2) hide show
  1. README.md +3 -3
  2. README_ja.md +1 -1
README.md CHANGED
@@ -7,14 +7,12 @@ pipeline_tag: text-classification
7
  library_name: fasttext
8
  ---
9
 
10
- # Swallow Edu Classifier
11
 
12
  [日本語版の README はこちら](https://huggingface.co/tokyotech-llm/edu-classifier/blob/main/README_ja.md)
13
 
14
  ## Model summary
15
 
16
- **NOTE**: This classifier is designed to work only with **Japanese** text. Quality for English or other languages is not guaranteed.
17
-
18
  This repository contains fastText classifiers for judging the educational value of Japanese web pages. It includes two types of classifiers:
19
 
20
  1. **Wiki-based classifier**: trained on Japanese Wikipedia text in academic categories.
@@ -24,6 +22,8 @@ The Wiki-based classifier is distributed under the [CC BY-SA 4.0](https://huggin
24
 
25
  These classifiers were employed for quality-filtering process in the Swallow Corpus Version 2\*, which was used to train the [Llama 3.1 Swallow](https://huggingface.co/collections/tokyotech-llm/llama-31-swallow-66fd4f7da32705cadd1d5bc6) series. Our experiments demonstrated that applying filtering based on the classifier’s scores enabled more effective improvements in the LLM’s Japanese knowledge, even with the same computational resources.
26
 
 
 
27
  \* A large Japanese web corpus extracted from Common Crawl
28
 
29
  ### How to use
 
7
  library_name: fasttext
8
  ---
9
 
10
+ # Swallow Education Classifier
11
 
12
  [日本語版の README はこちら](https://huggingface.co/tokyotech-llm/edu-classifier/blob/main/README_ja.md)
13
 
14
  ## Model summary
15
 
 
 
16
  This repository contains fastText classifiers for judging the educational value of Japanese web pages. It includes two types of classifiers:
17
 
18
  1. **Wiki-based classifier**: trained on Japanese Wikipedia text in academic categories.
 
22
 
23
  These classifiers were employed for quality-filtering process in the Swallow Corpus Version 2\*, which was used to train the [Llama 3.1 Swallow](https://huggingface.co/collections/tokyotech-llm/llama-31-swallow-66fd4f7da32705cadd1d5bc6) series. Our experiments demonstrated that applying filtering based on the classifier’s scores enabled more effective improvements in the LLM’s Japanese knowledge, even with the same computational resources.
24
 
25
+ **NOTE**: This classifier is designed to work with Japanese text. Its functionality and quality are not guaranteed for non-Japanese languages, including English.
26
+
27
  \* A large Japanese web corpus extracted from Common Crawl
28
 
29
  ### How to use
README_ja.md CHANGED
@@ -1,4 +1,4 @@
1
- # Swallow Edu Classifier
2
 
3
  ## 概要
4
 
 
1
+ # Swallow Education Classifier
2
 
3
  ## 概要
4