Model description

LSTM trained on Andrej Karpathy's tiny_shakespeare dataset, from his blog post, The Unreasonable Effectiveness of Recurrent Neural Networks.

Made to experiment with Hugging Face and W&B.

Intended uses & limitations

The model predicts the next character based on a variable-length input sequence. After 18 epochs of training, the model is generating text that is somewhat coherent.

def generate_text(model, encoder, text, n):
    vocab = encoder.get_vocabulary()
    generated_text = text
    for _ in range(n):
        encoded = encoder([generated_text])
        pred = model.predict(encoded, verbose=0)
        pred = tf.squeeze(tf.argmax(pred, axis=-1)).numpy()
        generated_text += vocab[pred]
    return generated_text

sample = "M"
print(generate_text(model, encoder, sample, 100))
MQLUS:
I will be so that the street of the state,
And then the street of the street of the state,
And

Training and evaluation data

https://example.com

Training procedure

The dataset consists of various works of William Shakespeare concatenated into a single file. The resulting file consists of individual speeches separated by \n\n.

The tokenizer is a Keras TextVectorization preprocessor that uses a simple character-based vocabulary.

To construct the training set, 100 characters are taken with the next character used as the target. This is repeated for each character in the text and results in 1,115,294 shuffled training examples.

TODO: upload encoder

Training hyperparameters

Hyperparameters Value
epochs 18
batch_size 1024
optimizer AdamW
weight_decay 0.001
learning_rate 0.00025

Model Plot

View Model Plot

Model Image

Downloads last month
17
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-generation models for tf-keras library.

Dataset used to train adamelliotfields/shakespeare