The models, tokenizers and datasets used for our submission for BabyLM 2024, investigating the viability of training LLMs on phoneme streams.
Language Modelling with Phonemes
AI & ML interests
Child language acquisition, CHILDES, word segmentation, phonemes, BabyLM
Recent Activity
View all activity
Collections
1
spaces
1
models
124
phonemetransformers/BABYLM-TOKENIZER-MEAN-ENTROPY-TXT
Updated
phonemetransformers/babylm-subwords-2-gpt2_lm-model
Text Generation
•
Updated
•
4
phonemetransformers/babylm-subwords-gpt2_lm-model
Updated
phonemetransformers/BABYLM-TOKENIZER-MEAN-Entropy-SPACELESS
Updated
phonemetransformers/BABYLM-TOKENIZER-MIN-Entropy-SPACELESS
Updated
phonemetransformers/BABYLM-TOKENIZER-MIN-Boundaryprediction-SPACELESS
Updated
phonemetransformers/BABYLM-TOKENIZER-MEAN-Boundaryprediction-SPACELESS
Updated
phonemetransformers/childes-segmentation-random-18M-gpt2_lm-model
Text Generation
•
Updated
•
2
phonemetransformers/childes-segmentation-100k-gpt2_lm-model
Text Generation
•
Updated
•
112
phonemetransformers/childes-segmentation-18M-gpt2_lm-model
Text Generation
•
Updated
•
8