Spaces:
Runtime error
Runtime error
mlkorra
commited on
Commit
·
3f6af0f
1
Parent(s):
b3beee3
Update App with About Section
Browse files- About/accomplishments.md +5 -0
- About/applications.md +5 -0
- About/contributors.md +3 -0
- About/credits.md +2 -0
- About/datasets.md +3 -0
- About/gitrepo.md +2 -0
- About/results.md +8 -0
- app.py +8 -70
- images/baseline.png +0 -0
- navigate.py +11 -0
- pages/__init__.py +0 -0
- pages/__pycache__/__init__.cpython-38.pyc +0 -0
- pages/__pycache__/about.cpython-38.pyc +0 -0
- pages/__pycache__/inference.cpython-38.pyc +0 -0
- pages/about.py +43 -0
- pages/inference.py +61 -0
About/accomplishments.md
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Accomplishment
|
2 |
+
|
3 |
+
* All of our models are having better result for two metrics(Exact and SARI scores) than baseline models
|
4 |
+
* Our t5-base-wikisplit and t5-v1_1-base-wikisplit model are achieving comparative results with half model size or weights that will enable faster inference
|
5 |
+
* We added [wikisplit](https://huggingface.co/metrics/wiki_split) metrics which is freely available at huggingface datasets. It will be easy to calculate relevent scores for this task from now on
|
About/applications.md
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Application
|
2 |
+
* Sentence Simplification
|
3 |
+
* Data Augmentation
|
4 |
+
* Sentence Rephrase
|
5 |
+
* Tweets Splitter - split long tweets into sub-tweets to maintain 140 character limit.
|
About/contributors.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
## Contributors
|
2 |
+
* [Bhadresh Savani](www.linkedin.com/in/bhadreshsavani)
|
3 |
+
* [Rahul Dev](https://twitter.com/mlkorra)
|
About/credits.md
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
## Credits
|
2 |
+
Huge thanks to Huggingface 🤗 & Google Jax/Flax team for such a wonderful community week. Especially for providing such massive computing resource. Big thanks to [Suraj Patil](https://huggingface.co/valhalla) & [Patrick von Platen](https://huggingface.co/patrickvonplaten) for solving our issues and mentoring during the whole community week.
|
About/datasets.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
## Datasets used
|
2 |
+
* [Wiki Split](https://research.google/tools/datasets/wiki-split/)
|
3 |
+
* [Web Split](https://github.com/shashiongithub/Split-and-Rephrase)
|
About/gitrepo.md
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
## Github Repo
|
2 |
+
* [t5-sentence-split](https://github.com/bhadreshpsavani/t5-sentence-split)
|
About/results.md
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Our Results
|
2 |
+
|
3 |
+
| Model | Exact | SARI | BLEU |
|
4 |
+
| --- | --- | --- | --- |
|
5 |
+
| [t5-base-wikisplit](https://huggingface.co/flax-community/t5-base-wikisplit) | 17.93 | 67.5438 | 76.9 |
|
6 |
+
| [t5-v1_1-base-wikisplit](https://huggingface.co/flax-community/t5-v1_1-base-wikisplit) | 18.1207 | 67.4873 | 76.9478 |
|
7 |
+
| [byt5-base-wikisplit](https://huggingface.co/flax-community/byt5-base-wikisplit) | 11.3582 | 67.2685 | 73.1682 |
|
8 |
+
| [t5-large-wikisplit](https://huggingface.co/flax-community/t5-large-wikisplit) | 18.6632 | 68.0501 | 77.1881 |
|
app.py
CHANGED
@@ -1,75 +1,13 @@
|
|
1 |
import streamlit as st
|
2 |
-
from
|
3 |
-
import
|
4 |
-
|
5 |
-
|
6 |
-
@st.cache(show_spinner=False)
|
7 |
-
def load_model(input_complex_sentence,model):
|
8 |
-
|
9 |
-
base_path = "flax-community/"
|
10 |
-
model_path = base_path + model
|
11 |
-
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
12 |
-
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
|
13 |
-
|
14 |
-
tokenized_sentence = tokenizer(input_complex_sentence,return_tensors="pt")
|
15 |
-
result = model.generate(tokenized_sentence['input_ids'],attention_mask = tokenized_sentence['attention_mask'],max_length=256,num_beams=5)
|
16 |
-
generated_sentence = tokenizer.decode(result[0],skip_special_tokens=True)
|
17 |
-
|
18 |
-
return generated_sentence
|
19 |
|
20 |
def main():
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
st.sidebar.write("## UI Options")
|
27 |
-
model = st.sidebar.selectbox(
|
28 |
-
"Please Choose the Model",
|
29 |
-
("t5-base-wikisplit","t5-v1_1-base-wikisplit", "byt5-base-wikisplit","t5-large-wikisplit"))
|
30 |
-
|
31 |
-
change_example = st.sidebar.checkbox("Try Random Examples")
|
32 |
-
|
33 |
-
st.sidebar.write('''
|
34 |
-
## Applications:
|
35 |
-
* Sentence Simplification
|
36 |
-
* Data Augmentation
|
37 |
-
* Sentence Rephrase
|
38 |
-
''')
|
39 |
-
|
40 |
-
|
41 |
-
st.sidebar.write("[More Exploration](https://github.com/bhadreshpsavani/t5-sentence-split)")
|
42 |
-
|
43 |
-
examples = [
|
44 |
-
"Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.",
|
45 |
-
"It broadcasts on AM frequency 1600 kHz and is under ownership of Multicultural Broadcasting with studios in Surrey , British Columbia .",
|
46 |
-
"On March 1 , the Blackhawks played in their 2nd outdoor game in franchise history at Soldier Field in part of the new NHL Stadium Series ",
|
47 |
-
"'' The Rain Song '' is a love ballad , over 7 minutes in length , and is considered by singer Robert Plant to be his best overall vocal performance .",
|
48 |
-
"The resulting knowledge about human kinesiology and sport nutrition combined with his distinctive posing styles makes Kamali a sought out bodybuilder for seminars and guest appearances and has been featured in many bodybuilding articles , as well as being on the cover of MUSCLEMAG magazine .",
|
49 |
-
"The East London Line closed on 22 December 2007 and reopened on 27 April 2010 , becoming part of the new London Overground system .",
|
50 |
-
"' Bandolier - Budgie ' , a free iTunes app for iPad , iPhone and iPod touch , released in December 2011 , tells the story of the making of Bandolier in the band 's own words - including an extensive audio interview with Burke Shelley .",
|
51 |
-
"' Eden Black ' was grown from seed in the late 1980s by Stephen Morley , under his conditions it produces pitchers that are almost completley black .",
|
52 |
-
"' Wilson should extend his stint on The Voice to renew public interest in the band ; given that they 're pulling out all the stops , they deserve all the acclaim that surrounded them for their first two albums .",
|
53 |
-
"'' '' New York Mining Disaster 1941 '' '' was the second EP released by the Bee Gees in 1967 on the Spin Records , like their first EP , it was released only in Australia .",
|
54 |
-
"'' ADAPTOGENS : Herbs for Strength , Stamina , and Stress Relief , '' Healing Arts Press , 2007 - contains a detailed monograph on Schisandra chinensis as well as highlights health benefits ."
|
55 |
-
]
|
56 |
-
|
57 |
-
if change_example:
|
58 |
-
example = examples[random.randint(0, len(examples)-1)]
|
59 |
-
input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
|
60 |
-
split = st.button('Change and Split✂️')
|
61 |
-
else:
|
62 |
-
example=examples[0]
|
63 |
-
input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
|
64 |
-
split = st.button('Split✂️')
|
65 |
-
|
66 |
-
if split:
|
67 |
-
with st.spinner("Spliting Sentence...🧠"):
|
68 |
-
generated_sentence = load_model(input_complex_sentence, model)
|
69 |
-
sentence1, sentence2, _ = generated_sentence.split(".")
|
70 |
-
st.write("**Sentence1:** "+sentence1+".")
|
71 |
-
st.write("**Sentence2:** "+sentence2+".")
|
72 |
-
|
73 |
|
74 |
if __name__ == "__main__":
|
75 |
-
|
|
|
1 |
import streamlit as st
|
2 |
+
from pages import inference,about
|
3 |
+
from navigate import Navigate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
def main():
|
6 |
+
|
7 |
+
app = Navigate()
|
8 |
+
app.add_app("Inference", inference.load_page)
|
9 |
+
app.add_app("About", about.load_page)
|
10 |
+
app.run()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
if __name__ == "__main__":
|
13 |
+
main()
|
images/baseline.png
ADDED
![]() |
navigate.py
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
class Navigate:
|
4 |
+
def __init__(self):
|
5 |
+
self.apps = []
|
6 |
+
def add_app(self, title, func):
|
7 |
+
self.apps.append({"title": title, "function": func})
|
8 |
+
def run(self):
|
9 |
+
#st.sidebar.header("Sections")
|
10 |
+
app = st.sidebar.radio("", self.apps, format_func=lambda app: app["title"])
|
11 |
+
app["function"]()
|
pages/__init__.py
ADDED
File without changes
|
pages/__pycache__/__init__.cpython-38.pyc
ADDED
Binary file (172 Bytes). View file
|
|
pages/__pycache__/about.cpython-38.pyc
ADDED
Binary file (1.89 kB). View file
|
|
pages/__pycache__/inference.cpython-38.pyc
ADDED
Binary file (3.85 kB). View file
|
|
pages/about.py
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
import os
|
3 |
+
|
4 |
+
def read_markdown(path, folder="./About/"):
|
5 |
+
with open(os.path.join(folder, path)) as f:
|
6 |
+
return f.read()
|
7 |
+
|
8 |
+
def load_page():
|
9 |
+
|
10 |
+
st.markdown(""" # T5 for Sentence Split in English """)
|
11 |
+
st.markdown(""" ### Sentence Split is task of dividing complex sentence in two simple sentences """)
|
12 |
+
|
13 |
+
st.markdown(""" ## Goal """)
|
14 |
+
st.markdown(""" To make best sentence split model available till now """)
|
15 |
+
|
16 |
+
st.markdown(""" ## How to use the Model """)
|
17 |
+
st.markdown("""
|
18 |
+
|
19 |
+
```python
|
20 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
21 |
+
tokenizer = AutoTokenizer.from_pretrained("flax-community/t5-base-wikisplit")
|
22 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("flax-community/t5-base-wikisplit")
|
23 |
+
|
24 |
+
complex_sentence = "This comedy drama is produced by Tidy , the company she co-founded in 2008 with her husband David Peet , who is managing director ."
|
25 |
+
sample_tokenized = tokenizer(complex_sentence, return_tensors="pt")
|
26 |
+
|
27 |
+
answer = model.generate(sample_tokenized['input_ids'], attention_mask = sample_tokenized['attention_mask'], max_length=256, num_beams=5)
|
28 |
+
gene_sentence = tokenizer.decode(answer[0], skip_special_tokens=True)
|
29 |
+
gene_sentence
|
30 |
+
|
31 |
+
\"""
|
32 |
+
Output:
|
33 |
+
This comedy drama is produced by Tidy. She co-founded Tidy in 2008 with her husband David Peet, who is managing director.
|
34 |
+
\"""
|
35 |
+
|
36 |
+
``` """)
|
37 |
+
st.markdown(read_markdown("datasets.md"))
|
38 |
+
st.markdown(read_markdown("applications.md"))
|
39 |
+
st.markdown(read_markdown("results.md"))
|
40 |
+
st.markdown(read_markdown("accomplishments.md"))
|
41 |
+
st.markdown(read_markdown("gitrepo.md"))
|
42 |
+
st.markdown(read_markdown("contributors.md"))
|
43 |
+
st.markdown(read_markdown("credits.md"))
|
pages/inference.py
ADDED
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
from transformers import AutoTokenizer,AutoModelForSeq2SeqLM
|
3 |
+
import random
|
4 |
+
|
5 |
+
|
6 |
+
@st.cache(show_spinner=False)
|
7 |
+
def load_model(input_complex_sentence,model):
|
8 |
+
|
9 |
+
base_path = "flax-community/"
|
10 |
+
model_path = base_path + model
|
11 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
12 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
|
13 |
+
|
14 |
+
tokenized_sentence = tokenizer(input_complex_sentence,return_tensors="pt")
|
15 |
+
result = model.generate(tokenized_sentence['input_ids'],attention_mask = tokenized_sentence['attention_mask'],max_length=256,num_beams=5)
|
16 |
+
generated_sentence = tokenizer.decode(result[0],skip_special_tokens=True)
|
17 |
+
|
18 |
+
return generated_sentence
|
19 |
+
|
20 |
+
def load_page():
|
21 |
+
|
22 |
+
st.sidebar.title("🧠Sentence Simplifier")
|
23 |
+
st.title("Sentence Split in English using T5 Variants")
|
24 |
+
st.write("Sentence Split is the task of **dividing a long Complex Sentence into Simple Sentences**")
|
25 |
+
|
26 |
+
st.sidebar.write("## UI Options")
|
27 |
+
model = st.sidebar.selectbox(
|
28 |
+
"Please Choose the Model",
|
29 |
+
("t5-base-wikisplit","t5-v1_1-base-wikisplit", "byt5-base-wikisplit","t5-large-wikisplit"))
|
30 |
+
|
31 |
+
change_example = st.sidebar.checkbox("Try Random Examples")
|
32 |
+
|
33 |
+
examples = [
|
34 |
+
"Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.",
|
35 |
+
"It broadcasts on AM frequency 1600 kHz and is under ownership of Multicultural Broadcasting with studios in Surrey , British Columbia .",
|
36 |
+
"On March 1 , the Blackhawks played in their 2nd outdoor game in franchise history at Soldier Field in part of the new NHL Stadium Series ",
|
37 |
+
"'' The Rain Song '' is a love ballad , over 7 minutes in length , and is considered by singer Robert Plant to be his best overall vocal performance .",
|
38 |
+
"The resulting knowledge about human kinesiology and sport nutrition combined with his distinctive posing styles makes Kamali a sought out bodybuilder for seminars and guest appearances and has been featured in many bodybuilding articles , as well as being on the cover of MUSCLEMAG magazine .",
|
39 |
+
"The East London Line closed on 22 December 2007 and reopened on 27 April 2010 , becoming part of the new London Overground system .",
|
40 |
+
"' Bandolier - Budgie ' , a free iTunes app for iPad , iPhone and iPod touch , released in December 2011 , tells the story of the making of Bandolier in the band 's own words - including an extensive audio interview with Burke Shelley .",
|
41 |
+
"' Eden Black ' was grown from seed in the late 1980s by Stephen Morley , under his conditions it produces pitchers that are almost completley black .",
|
42 |
+
"' Wilson should extend his stint on The Voice to renew public interest in the band ; given that they 're pulling out all the stops , they deserve all the acclaim that surrounded them for their first two albums .",
|
43 |
+
"'' '' New York Mining Disaster 1941 '' '' was the second EP released by the Bee Gees in 1967 on the Spin Records , like their first EP , it was released only in Australia .",
|
44 |
+
"'' ADAPTOGENS : Herbs for Strength , Stamina , and Stress Relief , '' Healing Arts Press , 2007 - contains a detailed monograph on Schisandra chinensis as well as highlights health benefits ."
|
45 |
+
]
|
46 |
+
|
47 |
+
if change_example:
|
48 |
+
example = examples[random.randint(0, len(examples)-1)]
|
49 |
+
input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
|
50 |
+
split = st.button('Change and Split✂️')
|
51 |
+
else:
|
52 |
+
example=examples[0]
|
53 |
+
input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
|
54 |
+
split = st.button('Split✂️')
|
55 |
+
|
56 |
+
if split:
|
57 |
+
with st.spinner("Spliting Sentence...🧠"):
|
58 |
+
generated_sentence = load_model(input_complex_sentence, model)
|
59 |
+
sentence1, sentence2, _ = generated_sentence.split(".")
|
60 |
+
st.write("**Sentence1:** "+sentence1+".")
|
61 |
+
st.write("**Sentence2:** "+sentence2+".")
|