|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text2text-generation |
|
tags: |
|
- information extraction |
|
- entity linking |
|
- named entity recogntion |
|
- relation extraction |
|
- text-to-text generation |
|
--- |
|
# T5-for-information-extraction |
|
|
|
This is an encoder-decoder model that was trained on various information extraction tasks, including text classification, named entity recognition, relation extraction and entity linking. |
|
|
|
### How to use: |
|
First of all, initialize the model: |
|
```python |
|
from transformers import T5Tokenizer, T5ForConditionalGeneration |
|
import torch |
|
|
|
device = torch.device("cuda") if torch.cuda.is_available() else torch.device('cpu') |
|
|
|
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large") |
|
|
|
model = T5ForConditionalGeneration.from_pretrained("knowledgator/t5-for-ie").to(device) |
|
``` |
|
|
|
You need to set a prompt and put it with text to the model, below are examples of how to use it for different tasks: |
|
|
|
**named entity recognition** |
|
```python |
|
input_text = "Extract entity types from the text: <e1>Kyiv</e1> is the capital of <e2>Ukraine</e2>." |
|
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device) |
|
|
|
outputs = model.generate(input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
**text classification** |
|
```python |
|
input_text = "Classify the following text into the most relevant categories: Kyiv is the capital of Ukraine" |
|
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device) |
|
|
|
outputs = model.generate(input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
**relation extraction** |
|
```python |
|
input_text = "Extract relations between entities in the text: <e1>Kyiv</e1> is the capital of <e2>Ukraine</e2>." |
|
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device) |
|
|
|
outputs = model.generate(input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
### Unlimited-classifier |
|
With our [unlimited-classifier](https://github.com/Knowledgator/unlimited_classifier) you can use `t5-for-ie` to classify text into millions of categories. It applies generation with contraints that is super helful when structured and deterministic outputs are needed. |
|
|
|
To install it, run the following command: |
|
|
|
```bash |
|
pip install -U unlimited-classifier |
|
``` |
|
|
|
Right now you can try it with the following example: |
|
```python |
|
from unlimited_classifier import TextClassifier |
|
|
|
labels=[ |
|
"e1 - capital of Ukraine", |
|
"e1 - capital of Poland", |
|
"e1 - European city", |
|
"e1 - Asian city", |
|
"e1 - small country" |
|
] |
|
|
|
classifier = TextClassifier( |
|
labels=['default'], |
|
model=model, |
|
tokenizer=tokenizer, |
|
device=device #if cuda |
|
) |
|
classifier.initialize_labels_trie(labels) |
|
|
|
text = "<e1>Kyiv</e1> is the capital <e2>Ukraine</e2>." |
|
|
|
output = classifier.invoke(text) |
|
print(output) |
|
``` |
|
|
|
### Turbo T5 |
|
|
|
We recommend to use this model on GPU with our [TurboT5 package](https://github.com/Knowledgator/TurboT5), it uses custom CUDA kernels that accelerate computations and allows much longer sequences. |
|
|
|
First of all, you need to install the package |
|
|
|
``` |
|
pip install turbot5 -U |
|
``` |
|
Then you can import different heads for various purposes; we released more encoder heads for tasks such as token classification, question-answering or text classification and, of course, encoder-decoder heads for conditional generation: |
|
|
|
```python |
|
from turbot5 import T5ForConditionalGeneration |
|
from turbot5 import T5Config |
|
from transformers import T5Tokenizer |
|
import torch |
|
|
|
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large") |
|
model = T5ForConditionalGeneration.from_pretrained("knowledgator/t5-for-ie", |
|
attention_type = 'flash', #put attention type you want to use |
|
use_triton=True).to('cuda') |
|
``` |
|
|
|
### Feedback |
|
We value your input! Share your feedback and suggestions to help us improve our models. |
|
Fill out the feedback [form](https://forms.gle/5CPFFuLzNWznjcpL7) |
|
|
|
### Join Our Discord |
|
Connect with our community on Discord for news, support, and discussion about our models. |
|
Join [Discord](https://discord.gg/dkyeAgs9DG) |
|
|