Text2Text Generation
Transformers
PyTorch
Safetensors
Czech
English
mt5
Inference Endpoints
Edit model card

Model Card for mTk-SQuAD_en-SQAD_cs-1B

This model is a generative in-context few-shot learner specialized in Czech. It was trained on a combination of English SQuAD and Czech SQAD dataset.

You can find detailed information on Project Github & the referenced paper.

Model Details

Model Description

  • Developed by: Michal Stefanik & Marek Kadlcik, Masaryk University
  • Model type: mt5
  • Language(s) (NLP): cs,en
  • License: MIT
  • Finetuned from model: google/mt5-large

Model Sources

Uses

This model is intended to be used in a few-shot in-context learning format in the target language (Czech), or in the source language (English, see below). It was evaluated for unseen task learning (with k=3 demonstrations) in Czech: see the referenced paper for details.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("{this model path}")
tokenizer = AutoTokenizer.from_pretrained("{this model path}")

# Instead, use keywords "Otázka", "Kontext" and "Odpověď" for Czech few-shot prompts
input_text = """
    Question: What is the customer's name? 
    Context: Origin: Barrack Obama, Customer id: Bill Moe. 
    Answer: Bill Moe, 
    Question: What is the customer's name? 
    Context: Customer id: Barrack Obama, if not deliverable, return to Bill Clinton. 
    Answer:
"""

inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs)

print("Answer:")
print(tokenizer.decode(outputs))

Training Details

Training this model can be reproduced by running pip install -r requirements.txt && python train_mt5_qa_en_SQuAD+cs_random.py. See the referenced script for hyperparameters and other training configurations.

Citation

BibTeX:

@inproceedings{stefanik2023resources,
               author = {\v{S}tef\'{a}nik, Michal and Kadlčík, Marek and Gramacki, Piotr and Sojka, Petr},
               title = {Resources and Few-shot Learners for In-context Learning in Slavic Languages},
               booktitle = {Proceedings of the 9th Workshop on Slavic Natural Language Processing},
               publisher = {ACL},
               numpages = {9},
               url = {https://arxiv.org/abs/2304.01922},
}
Downloads last month
6
Safetensors
Model size
1.23B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train fewshot-goes-multilingual/mTk-SQuAD_en-SQAD_cs-1B