Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- ja
|
4 |
+
tags:
|
5 |
+
- vibrato
|
6 |
+
---
|
7 |
+
|
8 |
+
# Vibrato Model Archive
|
9 |
+
|
10 |
+
This repository hosts all models from the Vibrato GitHub release.
|
11 |
+
Models that were compressed using zstd have already been decompressed, so they are ready for direct download and use.
|
12 |
+
|
13 |
+
> Important: This repository deliberately does not provide a unified license because it contains models under different licensing terms.
|
14 |
+
> While most models have been detected to use the BSD license, some do not have a standard license.
|
15 |
+
> It is recommended to check the license file in each folder before using the models.
|
16 |
+
|
17 |
+
For those seeking models under the BSD license, you can visit the following repository ([aha-org/vibrato-models-bsdonly](https://huggingface.co/aha-org/vibrato-models-bsdonly)).
|
18 |
+
|
19 |
+
## Available Models
|
20 |
+
|
21 |
+
- bccwj-suw+unidic-cwj-3_1_1+compact-dual
|
22 |
+
- bccwj-suw+unidic-cwj-3_1_1+compact
|
23 |
+
- bccwj-suw+unidic-cwj-3_1_1-extracted+compact-dual
|
24 |
+
- bccwj-suw+unidic-cwj-3_1_1-extracted+compact
|
25 |
+
- bccwj-suw+unidic-cwj-3_1_1
|
26 |
+
- ipadic-mecab-2_7_0-small
|
27 |
+
- ipadic-mecab-2_7_0
|
28 |
+
- jumandic-mecab-7_0
|
29 |
+
- naist-jdic-mecab-0_6_3b
|
30 |
+
- unidic-cwj-3_1_1+compact-dual
|
31 |
+
- unidic-cwj-3_1_1+compact
|
32 |
+
- unidic-cwj-3_1_1
|
33 |
+
- unidic-mecab-2_1_2
|
34 |
+
|
35 |
+
## Usage
|
36 |
+
|
37 |
+
```python
|
38 |
+
from huggingface_hub import hf_hub_download
|
39 |
+
import vibrato
|
40 |
+
|
41 |
+
# Load tokenizer from `.cache/hf`
|
42 |
+
model_path = hf_hub_download("ryan-minato/vibrato-models", "<<model_name>>/system.dic")
|
43 |
+
with open(model_path, "rb") as f:
|
44 |
+
tokenizer = vibrato.Vibrato(f.read())
|
45 |
+
|
46 |
+
text = """\
|
47 |
+
「四十二だと!」ルーンクォールが叫んだ。
|
48 |
+
「七百五十万年かけて、それだけか?」
|
49 |
+
「何度も徹底的に検算しました」コンピュータが応じた。
|
50 |
+
「まちがいなくそれが答えです。率直なところ、みなさんのほうで究極の疑問が何であるかわかっていなかったところに問題があるのです」
|
51 |
+
"""
|
52 |
+
|
53 |
+
tokenizer.tokenize(text)
|
54 |
+
```
|