munish0838 commited on
Commit
6cd3c0d
β€’
1 Parent(s): 2c369f3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
3
+ license: llama3
4
+ base_model: rinna/llama-3-youko-8b
5
+ datasets:
6
+ - mc4
7
+ - wikipedia
8
+ - EleutherAI/pile
9
+ - oscar-corpus/colossal-oscar-1.0
10
+ - cc100
11
+ language:
12
+ - ja
13
+ - en
14
+ inference: false
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # QuantFactory/llama-3-youko-8b-GGUF
19
+ This is quantized version of [rinna/llama-3-youko-8b](https://huggingface.co/rinna/llama-3-youko-8b) created using llama.cpp
20
+
21
+ # Model Description
22
+
23
+ ![rinna-icon](./rinna.png)
24
+
25
+ # Overview
26
+
27
+ We conduct continual pre-training of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on **22B** tokens from a mixture of Japanese and English datasets. The continual pre-training significantly improves the model's performance on Japanese tasks.
28
+
29
+ The name `youko` comes from the Japanese word [`妖狐/γ‚ˆγ†γ“/Youko`](https://ja.wikipedia.org/wiki/%E5%A6%96%E7%8B%90), which is a kind of Japanese mythical creature ([`ε¦–ζ€ͺ/γ‚ˆγ†γ‹γ„/Youkai`](https://ja.wikipedia.org/wiki/%E5%A6%96%E6%80%AA)).
30
+
31
+
32
+ * **Library**
33
+
34
+ The model was trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
35
+
36
+ * **Model architecture**
37
+
38
+ A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.
39
+
40
+ * **Training: Built with Meta Llama 3**
41
+
42
+ The model was initialized with the [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model and continually trained on around **22B** tokens from a mixture of the following corpora
43
+ - [Japanese CC-100](https://huggingface.co/datasets/cc100)
44
+ - [Japanese C4](https://huggingface.co/datasets/mc4)
45
+ - [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
46
+ - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
47
+ - [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
48
+ - rinna curated Japanese dataset
49
+
50
+ * **Contributors**
51
+
52
+ - [Koh Mitsuda](https://huggingface.co/mitsu-koh)
53
+ - [Kei Sawada](https://huggingface.co/keisawada)
54
+
55
+ ---
56
+
57
+ # Benchmarking
58
+
59
+ Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
60
+
61
+ ---
62
+
63
+ # Tokenization
64
+ The model uses the original meta-llama/Meta-Llama-3-8B tokenizer.
65
+
66
+ ---
67
+
68
+ # How to cite original model
69
+ ```bibtex
70
+ @misc{rinna-llama-3-youko-8b,
71
+ title = {rinna/llama-3-youko-8b},
72
+ author = {Mitsuda, Koh and Sawada, Kei},
73
+ url = {https://huggingface.co/rinna/llama-3-youko-8b},
74
+ }
75
+
76
+ @inproceedings{sawada2024release,
77
+ title = {Release of Pre-Trained Models for the {J}apanese Language},
78
+ author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
79
+ booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
80
+ month = {5},
81
+ year = {2024},
82
+ url = {https://arxiv.org/abs/2404.01657},
83
+ }
84
+ ```
85
+ ---
86
+
87
+ # References
88
+ ```bibtex
89
+ @article{llama3modelcard,
90
+ title={Llama 3 Model Card},
91
+ author={AI@Meta},
92
+ year={2024},
93
+ url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
94
+ }
95
+
96
+ @software{gpt-neox-library,
97
+ title = {{GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch}},
98
+ author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
99
+ doi = {10.5281/zenodo.5879544},
100
+ month = {8},
101
+ year = {2021},
102
+ version = {0.0.1},
103
+ url = {https://www.github.com/eleutherai/gpt-neox},
104
+ }
105
+ ```
106
+ ---
107
+
108
+ # License
109
+ [Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)