Narrativaai
/

bloom-560m-finetuned-totto-table-to-text

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bloom-560m-finetuned-totto-table-to-text / README.md

mrm8488's picture

Update README.md

b33c90c about 2 years ago

|

2.4 kB

	---
	language:
	- en
	tags:
	- table-to-text
	- tabular
	datasets:
	- totto
	---

	# BLOOM (0.56B) fine-tuned on ToTTo for Table-to-text 📋 ➡️ 🔤

	This model is a fine-tuned version of [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m) on the ToTTo [dataset](https://huggingface.co/datasets/totto).


	## The model 🧠

	It is a 560M params version of [BLOOM 🌸](https://bigscience.huggingface.co/blog/bloom)

	## The dataset 📚

	ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.

	During the dataset creation process, tables from English Wikipedia are matched with (noisy) descriptions. Each table cell mentioned in the description is highlighted and the descriptions are iteratively cleaned and corrected to faithfully reflect the content of the highlighted cells.


	### Evaluation results

	\| Metric \| Value \|
	\|:-------:\|:-----:\|
	\| rouge1 \| 0.56 \|
	\| rouge2 \| 0.33 \|
	\| rougeL \| 0.48 \|
	\| rougeLsum \| 0.48 \|


	## Usage

	```py
	from datasets import load_dataset
	from transformers import BloomTokenizerFast, BloomForCausalLM

	valid_dataset = load_dataset('totto', split='validation')

	from preprocess import preprocess # This file is included in the repo

	# Now we linearize the tables
	valid_dataset = valid_dataset.map(preprocess)

	model_ckpt = "mrm8488/bloom-560m-finetuned-totto-table-to-text"

	tokenizer = BloomTokenizerFast.from_pretrained(ckpt)
	model = BloomForCausalLM.from_pretrained(ckpt).to("cuda")


	def explain_hl_cells(text):
	inputs = tokenizer(text, return_tensors='pt')
	input_ids = inputs.input_ids.to("cuda")
	attention_mask = inputs.attention_mask.to("cuda")
	output = model.generate(input_ids, attention_mask=attention_mask, max_length=2048, eos_token_id=tokenizer.eos_token_id)

	return tokenizer.decode(output[0], skip_special_tokens=False)

	example = valid_dataset[1]

	print(explain_hl_cells(example['linearized_table'])
	```


	### Framework versions

	- Transformers 4.21.2
	- Pytorch 1.12.1+cu113
	- Datasets 2.4.0
	- Tokenizers 0.12.1


	Created by: [Narrativa](https://www.narrativa.com/)

	About Narrativa: Natural Language Generation (NLG) \| Gabriele, our machine learning-based platform, builds and deploys natural language solutions. #NLG #AI