transformersbook
/

codeparrot-small-vocabulary

Model card Files Files and versions Community

codeparrot-small-vocabulary / README.md

lvwerra's picture

lvwerra HF staff

Create README.md

96d7fd8 almost 3 years ago

|

538 Bytes

	# CodeParrot

	This is a small version of the CodeParrot tokenizer trained on the [CodeParrot Python code dataset](https://huggingface.co/datasets/transformersbook/codeparrot). The tokenizer is trained in Chapter 10: Training Transformers from Scratch in the [NLP with Transformers book](https://learning.oreilly.com/library/view/natural-language-processing/9781098103231/). You can find the full code in the accompanying [Github repository](https://github.com/nlp-with-transformers/notebooks/blob/main/10_transformers-from-scratch.ipynb).