Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ license: gpl-3.0
|
|
9 |
---
|
10 |
|
11 |
# Slovak GPT-J-405M
|
12 |
-
Slovak GPT-J-405M is the second model released in Slovak GPT-J series after its smaller variant [Slovak GPT-J-162M](https://huggingface.co/Milos/slovak-gpt-j-162M).
|
13 |
## Model Description
|
14 |
Model is based on [GPT-J](https://github.com/kingoflolz/mesh-transformer-jax/) and has over 405M trainable parameters.
|
15 |
|
@@ -37,7 +37,7 @@ The dataset was preprocessed and cleaned in a specific way that involves minor b
|
|
37 |
|
38 |
## Training procedure
|
39 |
|
40 |
-
This model was trained for a bit more than 36.5 billion tokens over 69,001 steps on TPU v3-8 pod. The cross-entropy validation loss at the last step was 2.821
|
41 |
|
42 |
## Intended Use
|
43 |
|
@@ -122,7 +122,7 @@ Since the dataset contains profanity, politically incorrect language, and (unint
|
|
122 |
|
123 |
## Citation and Related Information
|
124 |
|
125 |
-
This was done as a moonlighting project during summer of 2021 to better understand transformers. I didn't have much free time to open source it properly, so it all sat on my hard drive until now :)
|
126 |
|
127 |
If you use this model or have any questions about it feel free to hit me up at [twitter](https://twitter.com/miloskondela) or check out my [github](https://github.com/kondela) profile.
|
128 |
|
|
|
9 |
---
|
10 |
|
11 |
# Slovak GPT-J-405M
|
12 |
+
Slovak GPT-J-405M is the second model released in Slovak GPT-J series after its smaller variant [Slovak GPT-J-162M](https://huggingface.co/Milos/slovak-gpt-j-162M). Since then a larger [Slovak GPT-J-1.4B](https://huggingface.co/Milos/slovak-gpt-j-1.4B) was released.
|
13 |
## Model Description
|
14 |
Model is based on [GPT-J](https://github.com/kingoflolz/mesh-transformer-jax/) and has over 405M trainable parameters.
|
15 |
|
|
|
37 |
|
38 |
## Training procedure
|
39 |
|
40 |
+
This model was trained for a bit more than 36.5 billion tokens over 69,001 steps on TPU v3-8 pod. The cross-entropy validation loss at the last step was `2.821`.
|
41 |
|
42 |
## Intended Use
|
43 |
|
|
|
122 |
|
123 |
## Citation and Related Information
|
124 |
|
125 |
+
This was done as a moonlighting project during summer of 2021 to better understand transformers. I didn't have much free time to open source it properly, so it all sat on my hard drive until now :)
|
126 |
|
127 |
If you use this model or have any questions about it feel free to hit me up at [twitter](https://twitter.com/miloskondela) or check out my [github](https://github.com/kondela) profile.
|
128 |
|