Kirili4ik
/

mbart_ruDialogSum

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

Kirill Gelvan commited on Jan 14, 2022

Commit

455b203

•

1 Parent(s): 1f5f0c4

major update with code

Files changed (1) hide show

README.md +45 -1

README.md CHANGED Viewed

@@ -6,7 +6,9 @@ tags:
 - mbart
 inference:
   parameters:
-    no_repeat_ngram_size: 4
 datasets:
 - IlyaGusev/gazeta
 - samsum
@@ -44,3 +46,45 @@ model-index:
          value: 28
 ---
 ### 📝 Description

 - mbart
 inference:
   parameters:
+    no_repeat_ngram_size: 4,
+    top_k : 0,
+    num_beams : 5,
 datasets:
 - IlyaGusev/gazeta
 - samsum
          value: 28
 ---
 ### 📝 Description
+MBart for Russian summarization fine-tuned for **dialogues** summarization.
+This model was firstly fine-tuned by [Ilya Gusev](https://hf.co/IlyaGusev) on [Gazeta dataset](https://huggingface.co/datasets/IlyaGusev/gazeta). We have **fine tuned** that model on [SamSum dataset]() **translated to Russian** using GoogleTranslateAPI.
+⚠️ Due to specifics of the data Hosted inference API may not work properly ⚠️
+🤗 Moreover! We have implemented a **! telegram bot [@summarization_bot](https://t.me/summarization_bot) !** with the inference of this model. Add it to the chat and get summaries instead of dozens spam messages!  🤗
+### ❓ How to use with code
+```python
+from transformers import MBartTokenizer, MBartForConditionalGeneration
+# Download model and tokenizer
+model_name = "Kirili4ik/mbart_ruDialogSum"
+tokenizer =  AutoTokenizer.from_pretrained(model_name)
+model = MBartForConditionalGeneration.from_pretrained(model_name)
+model.eval()
+article_text = "..."
+input_ids = tokenizer(
+    [article_text],
+    max_length=600,
+    padding="max_length",
+    truncation=True,
+    return_tensors="pt",
+)["input_ids"]
+output_ids = model.generate(
+    input_ids=input_ids,
+    top_k=0,
+    num_beams=3,
+    no_repeat_ngram_size=3
+)[0]
+summary = tokenizer.decode(output_ids, skip_special_tokens=True)
+print(summary)
+```