MikkelWK's picture
Create README.md
2204267 verified
|
raw
history blame
309 Bytes
metadata
library_name: tokenizers
tags:
  - Danish
  - Morphological Tokenization
  - CerebrasGPT

DA-MORPH-CEREBRAS-TOKEN

This morphological tokenizer is designed for the CerebrasGPT architecture and focuses on segmenting Danish text based on linguistic principles, enabling more meaningful subword tokenization.