HarshitX commited on
Commit
df218ae
·
verified ·
1 Parent(s): c0d4229

Upload 6 files

Browse files

```markdown
# PDF Summarizer Streamlit App

Welcome to the PDF Summarizer Streamlit App! This application allows users to upload PDF documents and interactively ask questions about the content, receiving detailed answers generated by a powerful AI model.

## Features

- **PDF Upload**: Easily upload multiple PDF files through the Streamlit interface.
- **Text Extraction**: Automatically extract text from the uploaded PDF documents.
- **Text Chunking**: Split extracted text into manageable chunks for efficient processing.
- **Vector Store**: Store text embeddings in a FAISS vector store for fast similarity search.
- **AI-Powered Q&A**: Use the latest AI model to answer questions based on the PDF content.
- **Interactive Interface**: User-friendly interface powered by Streamlit for seamless interaction.

## Getting Started

### Prerequisites

Make sure you have the following installed:

- Python 3.7 or higher
- [Streamlit](https://streamlit.io/)
- Required Python packages (see `requirements.txt`)

### Installation

1. Clone the repository:

```bash
git clone <repository-url>
cd pdf-summarizer-streamlit
```

2. Install the required packages:

```bash
pip install -r requirements.txt
```

3. Set up your Google API key:

Create a `.env` file in the project root and add your Google API key:

```plaintext
GOOGLE_API_KEY=your_google_api_key_here
```

### Running the App

Start the Streamlit app by running:

```bash
streamlit run main.py
```

### Usage

1. Upload PDF files using the sidebar uploader.
2. Click on the "Submit & Process" button to process the files.
3. Enter a question related to the PDF content in the text input field.
4. Receive a detailed response from the AI model.

## Built With

- [Streamlit](https://streamlit.io/)
- [PyPDF2](https://pypi.org/project/PyPDF2/)
- [FAISS](https://github.com/facebookresearch/faiss)
- [Langchain](https://github.com/langchain-ai/langchain)
- [Google Generative AI](https://developers.google.com/)

## Author

- **Harshit Shukla**
- Email: [[email protected]](mailto:[email protected])

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Inspired by the need to efficiently summarize and interact with PDF documents.
- Thanks to the open-source community for providing the libraries and tools that made this project possible.

```

### Explanation:

- **Project Description:** Provides a high-level overview of the app and its features.
- **Getting Started:** Instructions for cloning, installing dependencies, and setting up environment variables.
- **Usage Instructions:** Steps to run the app and interact with it.
- **Technologies Used:** Lists key libraries and frameworks used.
- **Contact Information:** Includes the author's contact details.
- **License Information:** Placeholder for license details.

You can modify sections as needed to fit your project's specifics better or add any additional information you find necessary. Let me know if you need further modifications or additional details!

Files changed (6) hide show
  1. .env +2 -0
  2. .ignore +1 -0
  3. app.py +300 -0
  4. embed.py +55 -0
  5. requirements.txt +13 -0
  6. user_data.db +0 -0
.env ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+
2
+ GOOGLE_API_KEY = 'AIzaSyBaxMCjBV5fBlsKUmFb-8SGgkiirv1ZKck'
.ignore ADDED
@@ -0,0 +1 @@
 
 
1
+ .env
app.py ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sqlite3
3
+ import hashlib
4
+
5
+ import streamlit as st
6
+
7
+ import google.generativeai as genai
8
+
9
+ from langchain.chains import conversational_retrieval
10
+ from langchain.text_splitter import RecursiveCharacterTextSplitter
11
+
12
+ from langchain_community.document_loaders import PyPDFLoader
13
+ from langchain_community.vectorstores import FAISS
14
+ from langchain.chains.question_answering import load_qa_chain
15
+ from langchain.prompts import PromptTemplate
16
+
17
+ from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
18
+
19
+ import sqlite3
20
+ from datetime import datetime
21
+ from PyPDF2 import PdfReader
22
+
23
+ import pytz
24
+ import streamlit as st
25
+
26
+ from dotenv import load_dotenv
27
+
28
+ from streamlit_lottie import st_lottie
29
+ import requests
30
+ import random
31
+
32
+ # Load environemnt variables from .env files
33
+ load_dotenv()
34
+
35
+ from embed import add_user, create_table, peek, verify_user
36
+
37
+ # Create the User table
38
+ create_table()
39
+
40
+ st.set_page_config(page_title="Chat with PDF", layout="centered")
41
+
42
+ # Initialize Gemini API
43
+ goggle_api_key = os.getenv("GOGGLE_API_KEY")
44
+ genai.configure(api_key= goggle_api_key)
45
+ print(goggle_api_key)
46
+
47
+ # Initialize session state
48
+ if 'chat_history' not in st.session_state:
49
+ st.session_state.chat_history = {}
50
+ if 'flow_messages' not in st.session_state:
51
+ st.session_state.flow_messages = {}
52
+
53
+ def get_greeting_message():
54
+ ist = pytz.timezone('Asia/Kolkata')
55
+ current_datetime_ist = datetime.now(ist)
56
+ current_hour = current_datetime_ist.hour
57
+
58
+ if 5 <= current_hour < 12:
59
+ return "Good morning!"
60
+ elif 12 <= current_hour < 18:
61
+ return "Good afternoon!"
62
+ else:
63
+ return "Good evening!"
64
+
65
+
66
+ # Initialize Gemini API
67
+ google_api_key = os.getenv("GOOGLE_API_KEY")
68
+ if not google_api_key:
69
+ google_api_key = 'AIzaSyBaxMCjBV5fBlsKUmFb-8SGgkiirv1ZKck'
70
+ genai.configure(api_key=google_api_key)
71
+
72
+ # Global variable for embeddings
73
+ embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=google_api_key)
74
+
75
+ def get_pdf_text(pdf_docs):
76
+ text = ""
77
+ for pdf in pdf_docs:
78
+ pdf_reader = PdfReader(pdf)
79
+ for page in pdf_reader.pages:
80
+ text += page.extract_text()
81
+ return text
82
+
83
+ def get_text_chunks(text):
84
+ text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
85
+ chunks = text_splitter.split_text(text)
86
+ return chunks
87
+
88
+ def get_vector_store(text_chunks):
89
+ vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
90
+ vector_store.save_local("faiss_index")
91
+
92
+ def load_faiss_index():
93
+ return FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
94
+
95
+ def get_conversational_chain():
96
+ prompt_template = """
97
+ Answer the question as detailed as possible from the provided context, make sure to provide all the details, if the answer is not in
98
+ provided context just say, "answer is not available in the context", don't provide the wrong answer\n\n
99
+ Context:\n {context}?\n
100
+ Question: \n{question}\n
101
+
102
+ Answer:
103
+ """
104
+
105
+ model = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0.3)
106
+ prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
107
+ chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)
108
+ return chain
109
+
110
+ def process_user_input(user_question):
111
+ new_db = load_faiss_index()
112
+ docs = new_db.similarity_search(user_question)
113
+ chain = get_conversational_chain()
114
+ response = chain({"input_documents": docs, "question": user_question}, return_only_outputs=True)
115
+ print(response)
116
+ return response["output_text"]
117
+
118
+ def load_lottie_url(url: str):
119
+ r = requests.get(url)
120
+ if r.status_code != 200:
121
+ return None
122
+ return r.json()
123
+
124
+
125
+
126
+ def login():
127
+ st.subheader("Login")
128
+ username = st.text_input("Username")
129
+ password = st.text_input("Password", type="password")
130
+
131
+ if st.button("Login"):
132
+ user = verify_user(username, password)
133
+ if user:
134
+ st.success(f"Logged In as {username}")
135
+ st.session_state.logged_in = True
136
+ st.session_state.username = username
137
+ st.rerun()
138
+ return True
139
+ else:
140
+ st.error("Username or password is incorrect.")
141
+ return False
142
+
143
+ def signup():
144
+ st.subheader("Create New Account")
145
+ new_username = st.text_input("Enter Username")
146
+ new_password = st.text_input("Enter Password", type="password")
147
+ confirm_password = st.text_input("Confirm Password", type="password")
148
+
149
+ if st.button("Sign Up"):
150
+ if new_password == confirm_password:
151
+ try:
152
+ add_user(new_username, new_password)
153
+ peek()
154
+ st.success("You have successfully created an account!")
155
+ st.info("Go to Login Menu to login")
156
+ except sqlite3.IntegrityError:
157
+ st.error("Username already taken, please choose a different one.")
158
+ else:
159
+ st.warning("Passwords do not match.")
160
+
161
+ def marketplace(username):
162
+ # Custom CSS for better aesthetics
163
+ st.markdown("""
164
+ <style>
165
+ .stApp {
166
+ background-color: #f0f2f6;
167
+ }
168
+ .stButton>button {
169
+ background-color: #4CAF50;
170
+ color: white;
171
+ border-radius: 10px;
172
+ }
173
+ .stTextInput>div>div>input {
174
+ border-radius: 10px;
175
+ }
176
+ </style>
177
+ """, unsafe_allow_html=True)
178
+
179
+ # Create two columns for layout
180
+ col1, col2 = st.columns([1, 2])
181
+
182
+ with col1:
183
+ st.subheader(f"Welcome, {username}!")
184
+
185
+ # Display current date and time
186
+ ist = pytz.timezone('Asia/Kolkata')
187
+ current_datetime_ist = datetime.now(ist)
188
+ st.write(f"Current Date (IST): {current_datetime_ist.strftime('%Y-%m-%d')}")
189
+ st.write(f"Current Time (IST): {current_datetime_ist.strftime('%H:%M:%S')}")
190
+
191
+ # Add a Lottie animation
192
+ lottie_url = "https://assets5.lottiefiles.com/packages/lf20_ktwnwv5m.json"
193
+ lottie_json = load_lottie_url(lottie_url)
194
+ if lottie_json:
195
+ st_lottie(lottie_json, speed=1, height=200, key="initial")
196
+
197
+ # Category selection
198
+ sections = ["Astrology", "Biology", "Business", "Chemistry", "Medicine",
199
+ "Physics", "Sports", "Life Science", "Spirituality", "Others"]
200
+ selected_section = st.selectbox("Select a category", sections)
201
+
202
+ # File uploader
203
+ uploaded_file = st.file_uploader(f"Upload a PDF for {selected_section}", type="pdf")
204
+
205
+ if uploaded_file:
206
+ with st.spinner(f"Processing {uploaded_file.name}..."):
207
+ pdf_text = get_pdf_text([uploaded_file])
208
+ text_chunks = get_text_chunks(pdf_text)
209
+ get_vector_store(text_chunks)
210
+ st.success("Document processed successfully!")
211
+
212
+ # Add a fun fact or quote
213
+ facts = [
214
+ "Did you know? The first computer programmer was a woman named Ada Lovelace.",
215
+ "Fun fact: The term 'bug' in computer science originated from an actual moth found in a computer.",
216
+ "Quote: 'The science of today is the technology of tomorrow.' - Edward Teller"
217
+ ]
218
+ st.info(random.choice(facts))
219
+
220
+ with col2:
221
+ st.header(f"Chat about {selected_section}")
222
+
223
+ if uploaded_file:
224
+ # Initialize chat history for the selected section if it doesn't exist
225
+ if selected_section not in st.session_state.chat_history:
226
+ st.session_state.chat_history[selected_section] = {"messages": []}
227
+
228
+ # Display chat history
229
+ for message in st.session_state.chat_history[selected_section]["messages"]:
230
+ with st.chat_message("user" if message["is_user"] else "assistant"):
231
+ st.write(message["text"])
232
+
233
+ # User input
234
+ user_question = st.chat_input("Ask a question about the document:")
235
+ if user_question:
236
+ st.session_state.chat_history[selected_section]["messages"].append({"is_user": True, "text": user_question})
237
+
238
+ with st.chat_message("user"):
239
+ st.write(user_question)
240
+
241
+ with st.chat_message("assistant"):
242
+ with st.spinner("Thinking..."):
243
+ response = process_user_input(user_question)
244
+ st.write(response)
245
+
246
+ st.session_state.chat_history[selected_section]["messages"].append({"is_user": False, "text": response})
247
+
248
+ # Clear chat button
249
+ if st.button("Clear Chat"):
250
+ st.session_state.chat_history[selected_section]["messages"] = []
251
+ st.rerun()
252
+
253
+ # Add a feature to download chat history
254
+ if st.button("Download Chat History"):
255
+ chat_history = "\n".join([f"{'User' if msg['is_user'] else 'AI'}: {msg['text']}" for msg in st.session_state.chat_history[selected_section]["messages"]])
256
+ st.download_button(
257
+ label="Download",
258
+ data=chat_history,
259
+ file_name=f"{selected_section}_chat_history.txt",
260
+ mime="text/plain"
261
+ )
262
+
263
+ else:
264
+ st.info("Please upload a PDF document to start chatting.")
265
+
266
+ # Add a feedback section
267
+ st.subheader("Feedback")
268
+ feedback = st.text_area("We'd love to hear your thoughts! Please leave your feedback here:")
269
+ if st.button("Submit Feedback"):
270
+ # Here you would typically save this feedback to a database
271
+ st.success("Thank you for your feedback!")
272
+
273
+ # Footer
274
+ st.markdown("---")
275
+ st.markdown("Created with ❤️ by Harshit S | © 2024 PDF Reader App")
276
+
277
+ def main():
278
+ st.title("Choose the suitable category:")
279
+
280
+ if "logged_in" not in st.session_state:
281
+ st.session_state.logged_in = False
282
+
283
+ if st.session_state.logged_in:
284
+ marketplace(st.session_state.username)
285
+ else:
286
+ menu = ["Login", "Sign Up"]
287
+ choice = st.sidebar.selectbox("Menu", menu)
288
+
289
+ if choice == "Login":
290
+ call = login()
291
+ if call:
292
+ main()
293
+ elif choice == "Sign Up":
294
+ signup()
295
+
296
+ if __name__ == "__main__":
297
+ print(peek())
298
+ main()
299
+
300
+
embed.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sqlite3
2
+
3
+
4
+ def create_connection():
5
+ conn = sqlite3.connect("user_data.db")
6
+ return conn
7
+
8
+
9
+ def create_table():
10
+ conn = create_connection()
11
+ cursor = conn.cursor()
12
+ cursor.execute("""
13
+ CREATE TABLE IF NOT EXISTS users(
14
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
15
+ username TEXT UNIQUE NOT NULL,
16
+ password TEXT NOT NULL
17
+ )
18
+ """)
19
+
20
+ conn.commit()
21
+ conn.close()
22
+
23
+
24
+ def add_user(username, password):
25
+ conn = create_connection()
26
+ cursor = conn.cursor()
27
+ cursor.execute("""
28
+ INSERT INTO users(username, password) VALUES (?, ?)
29
+ """, (username, password))
30
+ conn.commit()
31
+ conn.close()
32
+
33
+
34
+ def peek():
35
+ conn = create_connection()
36
+ cursor = conn.cursor()
37
+ cursor.execute("""
38
+ SELECT * FROM users
39
+ """)
40
+ users = cursor.fetchall()
41
+ for user in users:
42
+ print(user)
43
+ conn.close()
44
+
45
+
46
+ def verify_user(username, password):
47
+ conn = create_connection()
48
+ cursor = conn.cursor()
49
+ cursor.execute("""
50
+ SELECT * FROM users
51
+ WHERE username = ? AND password = ?
52
+ """, (username, password))
53
+ user = cursor.fetchone()
54
+ conn.close()
55
+ return user
requirements.txt ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ chromadb
2
+ faiss-cpu
3
+ google-generativeai
4
+ langchain
5
+ langchain_community
6
+ langchain_google_genai
7
+ Pillow
8
+ pypdf
9
+ pyPDF2
10
+ python-dotenv
11
+ streamlit
12
+ streamlit_lottie
13
+ requests
user_data.db ADDED
Binary file (16.4 kB). View file