stefan-it
/

xlstm-german-wikipedia

Text Generation

Model card Files Files and versions Community

stefan-it commited on Jun 10, 2024

Commit

609e6b7

·

verified ·

1 Parent(s): 2228461

readme: add initial version

Files changed (1) hide show

README.md +89 -3

README.md CHANGED Viewed

@@ -1,3 +1,89 @@
----
-license: cc-by-sa-3.0
----

+---
+license: cc-by-sa-3.0
+language:
+- de
+library_name: flair
+---
+# Flair xLSTM Embeddings (German Wikipedia, Forward)
+Research & development of Flair xLSTM Embeddings (Forward) trained on [German Wikipedia dump](https://huggingface.co/datasets/gwlms/dewiki-20230701-flair-corpus).
+The Flair team is currently working on the integration of xLSTM (both LM training and fine-tuning models for downstream tasks).
+Check out the `xlstm` [branch in the Flair repository](https://github.com/flairNLP/flair/tree/xlstm) - many thanks to [Patrick Haller](https://huggingface.co/PatrickHaller) for the work on it.
+# Training
+The current model was trained with commit `18ef331` from the [`xlstm` branch](https://github.com/flairNLP/flair/tree/xlstm). The `xlstm` [library](https://github.com/NX-AI/xlstm) needs to be installed manually - also check that `pip3 install Ninja` is installed.
+The German Wikipedia dump from [this repository](https://huggingface.co/datasets/gwlms/dewiki-20230701-flair-corpus) is used, including sharding the corpus into a Flair-compatible format:
+* `valid.txt` -> Validation corpus
+* `test.txt` -> Test corpus
+* `train` -> Folder with text files as training corpus
+The model was trained with the following parameters for 2 epochs:
+```python3
+import flair
+import torch
+from flair.data import SubTokenDictionary
+from flair.models import xLSTMLanguageModel
+from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus
+from transformers import AutoTokenizer
+flair.device = torch.device('cuda:0')
+is_forward_lm = True
+dictionary = SubTokenDictionary.load("gwlms/bert-base-dewiki-v1")
+corpus = TextCorpus("/home/ubuntu/splitted_corpus",
+                    dictionary,
+                    is_forward_lm,
+                    character_level=False,
+                    random_case_flip=True,
+                    )
+xlstm_ablation_1 = """
+mlstm_block:
+  mlstm:
+    conv1d_kernel_size: 2
+    qkv_proj_blocksize: 2
+    num_heads: 2
+slstm_block:
+  slstm:
+    backend: cuda
+    num_heads: 2
+    conv1d_kernel_size: 2
+    bias_init: powerlaw_blockdependent
+  feedforward:
+    proj_factor: 1.3
+    act_fn: gelu
+context_length: 256
+num_blocks: 7
+embedding_dim: 128
+slstm_at: [1]
+"""
+language_model = xLSTMLanguageModel(dictionary, xlstm_cfg=xlstm_ablation_1, is_forward_lm=True)
+print(language_model)
+trainer = LanguageModelTrainer(language_model, corpus)
+trainer.train("xflair-german-wikipedia-xlstm_ablation_1-bs64-lr5-e2",
+              sequence_length=256,
+              mini_batch_size=64,
+              learning_rate=5,
+              patience=50,
+              max_epochs=2,
+              checkpoint=False,
+              num_workers=4,
+              )
+```
+# Caveats
+Notice: this model integration is heavily under development. And in the process of finding good hyper-parameters. Also downstream experiments are coming very soon.