Vishal74 commited on
Commit
93b6e70
1 Parent(s): bd9ba73

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -3
README.md CHANGED
@@ -1,3 +1,101 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - gpu
7
+ ---
8
+ # Text Summarization Model with Seq2Seq and LSTM
9
+
10
+ This model is a sequence-to-sequence (seq2seq) model for text summarization. It uses a bidirectional LSTM encoder and an LSTM decoder to generate summaries from input articles. The model was trained on a dataset with sequences of length up to 800 tokens.
11
+
12
+ ## Model Architecture
13
+
14
+ ### Encoder
15
+
16
+ - **Input Layer:** Takes input sequences of length `max_len_article`.
17
+ - **Embedding Layer:** Converts input sequences into dense vectors of size 100.
18
+ - **Bidirectional LSTM Layer:** Processes the embedded input, capturing dependencies in both forward and backward directions. Outputs hidden and cell states from both directions.
19
+ - **State Concatenation:** Combines forward and backward hidden and cell states to form the final encoder states.
20
+
21
+ ### Decoder
22
+
23
+ - **Input Layer:** Takes target sequences of variable length.
24
+ - **Embedding Layer:** Converts target sequences into dense vectors of size 100.
25
+ - **LSTM Layer:** Processes the embedded target sequences using an LSTM with the initial states set to the encoder states.
26
+ - **Dense Layer:** Applies a Dense layer with softmax activation to generate the probabilities for each word in the vocabulary.
27
+
28
+ ### Model Summary
29
+
30
+ | Layer (type) | Output Shape | Param # | Connected to |
31
+ |-----------------------|---------------------|-------------|-----------------------------|
32
+ | input_1 (InputLayer) | [(None, 800)] | 0 | - |
33
+ | embedding (Embedding) | (None, 800, 100) | 47,619,900 | input_1[0][0] |
34
+ | bidirectional | [(None, 200), | 160,800 | embedding[0][0] |
35
+ | (Bidirectional) | (None, 100), | | |
36
+ | | (None, 100), | | |
37
+ | | (None, 100), | | |
38
+ | | (None, 100)] | | |
39
+ | input_2 (InputLayer) | [(None, None)] | 0 | - |
40
+ | embedding_1 | (None, None, 100) | 15,515,800 | input_2[0][0] |
41
+ | (Embedding) | | | |
42
+ | concatenate | (None, 200) | 0 | bidirectional[0][1] |
43
+ | (Concatenate) | | | bidirectional[0][3] |
44
+ | concatenate_1 | (None, 200) | 0 | bidirectional[0][2] |
45
+ | (Concatenate) | | | bidirectional[0][4] |
46
+ | lstm | [(None, None, 200), | 240,800 | embedding_1[0][0] |
47
+ | (LSTM) | (None, 200), | | concatenate[0][0] |
48
+ | | (None, 200)] | | concatenate_1[0][0] |
49
+ | dense (Dense) | (None, None, 155158)| 31,186,758 | lstm[0][0] |
50
+ | | | | |
51
+
52
+ Total params: 94,724,060
53
+
54
+ Trainable params: 94,724,058
55
+
56
+ Non-trainable params: 0
57
+
58
+ ## Training
59
+
60
+ The model was trained on a dataset with sequences of length up to 800 tokens using the following configuration:
61
+
62
+ - **Optimizer:** Adam
63
+ - **Loss Function:** Categorical Crossentropy
64
+ - **Metrics:** Accuracy
65
+
66
+ ### Training Loss and Validation Loss
67
+
68
+ | Epoch | Training Loss | Validation Loss | Time per Epoch (s) |
69
+ |-------|---------------|-----------------|--------------------|
70
+ | 1 | 3.9044 | 0.4543 | 3087 |
71
+ | 2 | 0.3429 | 0.0976 | 3091 |
72
+ | 3 | 0.1054 | 0.0427 | 3096 |
73
+ | 4 | 0.0490 | 0.0231 | 3099 |
74
+ | 5 | 0.0203 | 0.0148 | 3098 |
75
+
76
+ ### Test Loss
77
+
78
+ | Test Loss |
79
+ |----------------------|
80
+ | 0.014802712015807629 |
81
+
82
+ ## Usage -- I will update this soon
83
+
84
+ To use this model, you can load it using the Hugging Face Transformers library:
85
+
86
+ ```python
87
+ from transformers import TFAutoModel
88
+
89
+ model = TFAutoModel.from_pretrained('your-model-name')
90
+
91
+ from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM
92
+
93
+ tokenizer = AutoTokenizer.from_pretrained('your-model-name')
94
+ model = TFAutoModelForSeq2SeqLM.from_pretrained('your-model-name')
95
+
96
+ article = "Your input text here."
97
+ inputs = tokenizer.encode("summarize: " + article, return_tensors="tf", max_length=800, truncation=True)
98
+ summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
99
+ summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
100
+
101
+ print(summary)