radlab
/

polish-gpt2-small

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Description

This is the polish gpt2 model in small architecture.

This model was released on 11.08.2023, actually is deprecated.

New version (radlab/polish-gpt2-small-v2) of this model is available there https://huggingface.co/radlab/polish-gpt2-small-v2

Datasets

Data which are used to train this model:

clarin-knext/msmarco-pl
clarin-knext/nq-pl
clarin-knext/hotpotqa-pl
clarin-knext/scidocs-pl
clarin-knext/nfcorpus-pl
clarin-knext/dbpedia-pl
clarin-knext/trec-covid-pl
clarin-knext/quora-pl
clarin-knext/arguana-pl
clarin-knext/fiqa-pl
own corpora not published yet

It is about 10,5 GB of data.

Metrics from W&B

train/loss: 2.9569
train/train_samples_per_second: 31.797
train/epoch: 20
train/train_steps_per_second: 3.18
train/total_flos: 16645483478384640000
train/train_loss: 3.106043342053213
train/learning_rate: 2.2070550413783577e-8
train/global_step: 3185240
train/train_runtime:1001735.8967
eval/samples_per_second: 57.896
eval/runtime: 1447.4458
eval/steps_per_second: 5.79
eval/loss: 2.890829086303711
eval/accuracy: 0.4637797431547294

Changelog

11.08.2023 publishig the first release of the model.

Downloads last month: 16

Safetensors

Model size

126M params

Tensor type

F32

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Datasets used to train radlab/polish-gpt2-small

Collection including radlab/polish-gpt2-small

GPT2 Models

All gpt2 models were trained from scratch • 3 items • Updated Oct 17, 2024