text2pandas-T5 / README.md
zeyadusf's picture
Update README.md
d12db1a verified
metadata
library_name: transformers
tags:
  - code generating
  - nlp
license: apache-2.0
datasets:
  - zeyadusf/text2pandas
language:
  - en
metrics:
  - bleu
  - rouge
base_model:
  - google-t5/t5-base
pipeline_tag: text2text-generation

Text to Pandas

Convert Text with context about your dataframe to code Pandas by py

Kaggle Kaggle GitHub

About Model :

I fine tuned T5, T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. Using Transformers library and trained on 5 epochs and learning rate was 3e-5 and scheduler type was cosine. You can see the rest of the hyperparameters in the notebook.
As for the results on test dataset:

  1. Prediction Loss: 0.0463
  • This is the average loss during the prediction phase of your model on the test set. A lower loss indicates that the model is predicting outputs that are closer to the expected values. In this case, a loss of 0.0463 suggests that the model is making fairly accurate predictions, as a low loss generally signals better performance.
    1. Prediction ROUGE-1: 0.8396
  • ROUGE-1 measures the overlap of unigrams (single words) between the predicted text and the reference text (in this case, the generated Pandas code and the ground truth). A score of 0.8396 (or ~84%) indicates that there is a high level of overlap between the predicted and true sequences, meaning that the model is capturing the general structure well.
    1. Prediction ROUGE-2: 0.8200
  • ROUGE-2 evaluates bigram (two-word) overlap between the predicted and reference texts. A score of 0.82 (~82%) suggests that the model is also doing well at capturing the relationships between words, which is important for generating coherent and syntactically correct code.
    1. Prediction ROUGE-L: 0.8396
  • ROUGE-L measures the longest common subsequence (LCS) between the predicted and reference sequences, focusing on the sequence order. A high ROUGE-L score (~84%) means the model is generating sequences that align well with the true code in terms of overall structure and ordering of operations. This is crucial when generating code, as the order of operations affects the logic.
    1. Prediction BLEU: 0.4729
  • BLEU evaluates how many n-grams (in this case, code snippets) in the predicted output match those in the reference output. A BLEU score of 0.4729 (or ~47%) is a moderate result for a text-to-code task. BLEU can be more challenging to optimize for code generation since it requires exact matches at a token level, including symbols, syntax, and even whitespace.

In general, this is a promising result, showing that the model is performing well on the task, with room for improvement on exact token matching (reflected by the BLEU score).

Inference Model :

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

tokenizer = AutoTokenizer.from_pretrained("zeyadusf/text2pandas-T5")
model = AutoModelForSeq2SeqLM.from_pretrained("zeyadusf/text2pandas-T5")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def generate_pandas(question, context, model, tokenizer, max_length=512, num_beams=4, early_stopping=True):
    """
    Generates text based on the provided question and context using a pre-trained model and tokenizer.

    Args:
        question (str): The question part of the input.
        context (str): The context (e.g., DataFrame description) related to the question.
        model (torch.nn.Module): The pre-trained language model (e.g., T5).
        tokenizer (PreTrainedTokenizer): The tokenizer corresponding to the model.
        max_length (int): Maximum length of the generated text.
        num_beams (int): The number of beams for beam search.
        early_stopping (bool): Whether to stop the beam search when enough hypotheses have reached the end.

    Returns:
        str: The generated text decoded by the tokenizer.
    """
    # Prepare the input text by combining the question and context
    input_text = f"<question> {question} <context> {context}"

    # Tokenize the input text, convert to tensor, and truncate if needed
    inputs = tokenizer.encode(input_text, return_tensors="pt", truncation=True, max_length=max_length)

    # Move inputs and model to the appropriate device
    inputs = inputs.to(device)
    model = model.to(device)

    # Generate predictions without calculating gradients
    with torch.no_grad():
        outputs = model.generate(inputs, max_length=max_length, num_beams=num_beams, early_stopping=early_stopping)

    # Decode the generated tokens into text, skipping special tokens
    predicted_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return predicted_text

# Example usage
question = "what is the total amount of players for the rockets in 1998 only?"
context = "df = pd.DataFrame(columns=['player', 'years_for_rockets'])"

# Generate and print the predicted text
predicted_text = generate_pandas(question, context, model, tokenizer)
print(predicted_text)

output

df[df['years_for_rockets'] == '1998']['player'].count()