Spaces:
Sleeping
Upload 6 files
Browse files```markdown
# PDF Summarizer Streamlit App
Welcome to the PDF Summarizer Streamlit App! This application allows users to upload PDF documents and interactively ask questions about the content, receiving detailed answers generated by a powerful AI model.
## Features
- **PDF Upload**: Easily upload multiple PDF files through the Streamlit interface.
- **Text Extraction**: Automatically extract text from the uploaded PDF documents.
- **Text Chunking**: Split extracted text into manageable chunks for efficient processing.
- **Vector Store**: Store text embeddings in a FAISS vector store for fast similarity search.
- **AI-Powered Q&A**: Use the latest AI model to answer questions based on the PDF content.
- **Interactive Interface**: User-friendly interface powered by Streamlit for seamless interaction.
## Getting Started
### Prerequisites
Make sure you have the following installed:
- Python 3.7 or higher
- [Streamlit](https://streamlit.io/)
- Required Python packages (see `requirements.txt`)
### Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd pdf-summarizer-streamlit
```
2. Install the required packages:
```bash
pip install -r requirements.txt
```
3. Set up your Google API key:
Create a `.env` file in the project root and add your Google API key:
```plaintext
GOOGLE_API_KEY=your_google_api_key_here
```
### Running the App
Start the Streamlit app by running:
```bash
streamlit run main.py
```
### Usage
1. Upload PDF files using the sidebar uploader.
2. Click on the "Submit & Process" button to process the files.
3. Enter a question related to the PDF content in the text input field.
4. Receive a detailed response from the AI model.
## Built With
- [Streamlit](https://streamlit.io/)
- [PyPDF2](https://pypi.org/project/PyPDF2/)
- [FAISS](https://github.com/facebookresearch/faiss)
- [Langchain](https://github.com/langchain-ai/langchain)
- [Google Generative AI](https://developers.google.com/)
## Author
- **Harshit Shukla**
- Email: [[email protected]](mailto:[email protected])
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Inspired by the need to efficiently summarize and interact with PDF documents.
- Thanks to the open-source community for providing the libraries and tools that made this project possible.
```
### Explanation:
- **Project Description:** Provides a high-level overview of the app and its features.
- **Getting Started:** Instructions for cloning, installing dependencies, and setting up environment variables.
- **Usage Instructions:** Steps to run the app and interact with it.
- **Technologies Used:** Lists key libraries and frameworks used.
- **Contact Information:** Includes the author's contact details.
- **License Information:** Placeholder for license details.
You can modify sections as needed to fit your project's specifics better or add any additional information you find necessary. Let me know if you need further modifications or additional details!
- .env +2 -0
- .ignore +1 -0
- app.py +300 -0
- embed.py +55 -0
- requirements.txt +13 -0
- user_data.db +0 -0
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
|
2 |
+
GOOGLE_API_KEY = 'AIzaSyBaxMCjBV5fBlsKUmFb-8SGgkiirv1ZKck'
|
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
.env
|
@@ -0,0 +1,300 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import sqlite3
|
3 |
+
import hashlib
|
4 |
+
|
5 |
+
import streamlit as st
|
6 |
+
|
7 |
+
import google.generativeai as genai
|
8 |
+
|
9 |
+
from langchain.chains import conversational_retrieval
|
10 |
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
11 |
+
|
12 |
+
from langchain_community.document_loaders import PyPDFLoader
|
13 |
+
from langchain_community.vectorstores import FAISS
|
14 |
+
from langchain.chains.question_answering import load_qa_chain
|
15 |
+
from langchain.prompts import PromptTemplate
|
16 |
+
|
17 |
+
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
|
18 |
+
|
19 |
+
import sqlite3
|
20 |
+
from datetime import datetime
|
21 |
+
from PyPDF2 import PdfReader
|
22 |
+
|
23 |
+
import pytz
|
24 |
+
import streamlit as st
|
25 |
+
|
26 |
+
from dotenv import load_dotenv
|
27 |
+
|
28 |
+
from streamlit_lottie import st_lottie
|
29 |
+
import requests
|
30 |
+
import random
|
31 |
+
|
32 |
+
# Load environemnt variables from .env files
|
33 |
+
load_dotenv()
|
34 |
+
|
35 |
+
from embed import add_user, create_table, peek, verify_user
|
36 |
+
|
37 |
+
# Create the User table
|
38 |
+
create_table()
|
39 |
+
|
40 |
+
st.set_page_config(page_title="Chat with PDF", layout="centered")
|
41 |
+
|
42 |
+
# Initialize Gemini API
|
43 |
+
goggle_api_key = os.getenv("GOGGLE_API_KEY")
|
44 |
+
genai.configure(api_key= goggle_api_key)
|
45 |
+
print(goggle_api_key)
|
46 |
+
|
47 |
+
# Initialize session state
|
48 |
+
if 'chat_history' not in st.session_state:
|
49 |
+
st.session_state.chat_history = {}
|
50 |
+
if 'flow_messages' not in st.session_state:
|
51 |
+
st.session_state.flow_messages = {}
|
52 |
+
|
53 |
+
def get_greeting_message():
|
54 |
+
ist = pytz.timezone('Asia/Kolkata')
|
55 |
+
current_datetime_ist = datetime.now(ist)
|
56 |
+
current_hour = current_datetime_ist.hour
|
57 |
+
|
58 |
+
if 5 <= current_hour < 12:
|
59 |
+
return "Good morning!"
|
60 |
+
elif 12 <= current_hour < 18:
|
61 |
+
return "Good afternoon!"
|
62 |
+
else:
|
63 |
+
return "Good evening!"
|
64 |
+
|
65 |
+
|
66 |
+
# Initialize Gemini API
|
67 |
+
google_api_key = os.getenv("GOOGLE_API_KEY")
|
68 |
+
if not google_api_key:
|
69 |
+
google_api_key = 'AIzaSyBaxMCjBV5fBlsKUmFb-8SGgkiirv1ZKck'
|
70 |
+
genai.configure(api_key=google_api_key)
|
71 |
+
|
72 |
+
# Global variable for embeddings
|
73 |
+
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=google_api_key)
|
74 |
+
|
75 |
+
def get_pdf_text(pdf_docs):
|
76 |
+
text = ""
|
77 |
+
for pdf in pdf_docs:
|
78 |
+
pdf_reader = PdfReader(pdf)
|
79 |
+
for page in pdf_reader.pages:
|
80 |
+
text += page.extract_text()
|
81 |
+
return text
|
82 |
+
|
83 |
+
def get_text_chunks(text):
|
84 |
+
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
|
85 |
+
chunks = text_splitter.split_text(text)
|
86 |
+
return chunks
|
87 |
+
|
88 |
+
def get_vector_store(text_chunks):
|
89 |
+
vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
|
90 |
+
vector_store.save_local("faiss_index")
|
91 |
+
|
92 |
+
def load_faiss_index():
|
93 |
+
return FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
|
94 |
+
|
95 |
+
def get_conversational_chain():
|
96 |
+
prompt_template = """
|
97 |
+
Answer the question as detailed as possible from the provided context, make sure to provide all the details, if the answer is not in
|
98 |
+
provided context just say, "answer is not available in the context", don't provide the wrong answer\n\n
|
99 |
+
Context:\n {context}?\n
|
100 |
+
Question: \n{question}\n
|
101 |
+
|
102 |
+
Answer:
|
103 |
+
"""
|
104 |
+
|
105 |
+
model = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0.3)
|
106 |
+
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
|
107 |
+
chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)
|
108 |
+
return chain
|
109 |
+
|
110 |
+
def process_user_input(user_question):
|
111 |
+
new_db = load_faiss_index()
|
112 |
+
docs = new_db.similarity_search(user_question)
|
113 |
+
chain = get_conversational_chain()
|
114 |
+
response = chain({"input_documents": docs, "question": user_question}, return_only_outputs=True)
|
115 |
+
print(response)
|
116 |
+
return response["output_text"]
|
117 |
+
|
118 |
+
def load_lottie_url(url: str):
|
119 |
+
r = requests.get(url)
|
120 |
+
if r.status_code != 200:
|
121 |
+
return None
|
122 |
+
return r.json()
|
123 |
+
|
124 |
+
|
125 |
+
|
126 |
+
def login():
|
127 |
+
st.subheader("Login")
|
128 |
+
username = st.text_input("Username")
|
129 |
+
password = st.text_input("Password", type="password")
|
130 |
+
|
131 |
+
if st.button("Login"):
|
132 |
+
user = verify_user(username, password)
|
133 |
+
if user:
|
134 |
+
st.success(f"Logged In as {username}")
|
135 |
+
st.session_state.logged_in = True
|
136 |
+
st.session_state.username = username
|
137 |
+
st.rerun()
|
138 |
+
return True
|
139 |
+
else:
|
140 |
+
st.error("Username or password is incorrect.")
|
141 |
+
return False
|
142 |
+
|
143 |
+
def signup():
|
144 |
+
st.subheader("Create New Account")
|
145 |
+
new_username = st.text_input("Enter Username")
|
146 |
+
new_password = st.text_input("Enter Password", type="password")
|
147 |
+
confirm_password = st.text_input("Confirm Password", type="password")
|
148 |
+
|
149 |
+
if st.button("Sign Up"):
|
150 |
+
if new_password == confirm_password:
|
151 |
+
try:
|
152 |
+
add_user(new_username, new_password)
|
153 |
+
peek()
|
154 |
+
st.success("You have successfully created an account!")
|
155 |
+
st.info("Go to Login Menu to login")
|
156 |
+
except sqlite3.IntegrityError:
|
157 |
+
st.error("Username already taken, please choose a different one.")
|
158 |
+
else:
|
159 |
+
st.warning("Passwords do not match.")
|
160 |
+
|
161 |
+
def marketplace(username):
|
162 |
+
# Custom CSS for better aesthetics
|
163 |
+
st.markdown("""
|
164 |
+
<style>
|
165 |
+
.stApp {
|
166 |
+
background-color: #f0f2f6;
|
167 |
+
}
|
168 |
+
.stButton>button {
|
169 |
+
background-color: #4CAF50;
|
170 |
+
color: white;
|
171 |
+
border-radius: 10px;
|
172 |
+
}
|
173 |
+
.stTextInput>div>div>input {
|
174 |
+
border-radius: 10px;
|
175 |
+
}
|
176 |
+
</style>
|
177 |
+
""", unsafe_allow_html=True)
|
178 |
+
|
179 |
+
# Create two columns for layout
|
180 |
+
col1, col2 = st.columns([1, 2])
|
181 |
+
|
182 |
+
with col1:
|
183 |
+
st.subheader(f"Welcome, {username}!")
|
184 |
+
|
185 |
+
# Display current date and time
|
186 |
+
ist = pytz.timezone('Asia/Kolkata')
|
187 |
+
current_datetime_ist = datetime.now(ist)
|
188 |
+
st.write(f"Current Date (IST): {current_datetime_ist.strftime('%Y-%m-%d')}")
|
189 |
+
st.write(f"Current Time (IST): {current_datetime_ist.strftime('%H:%M:%S')}")
|
190 |
+
|
191 |
+
# Add a Lottie animation
|
192 |
+
lottie_url = "https://assets5.lottiefiles.com/packages/lf20_ktwnwv5m.json"
|
193 |
+
lottie_json = load_lottie_url(lottie_url)
|
194 |
+
if lottie_json:
|
195 |
+
st_lottie(lottie_json, speed=1, height=200, key="initial")
|
196 |
+
|
197 |
+
# Category selection
|
198 |
+
sections = ["Astrology", "Biology", "Business", "Chemistry", "Medicine",
|
199 |
+
"Physics", "Sports", "Life Science", "Spirituality", "Others"]
|
200 |
+
selected_section = st.selectbox("Select a category", sections)
|
201 |
+
|
202 |
+
# File uploader
|
203 |
+
uploaded_file = st.file_uploader(f"Upload a PDF for {selected_section}", type="pdf")
|
204 |
+
|
205 |
+
if uploaded_file:
|
206 |
+
with st.spinner(f"Processing {uploaded_file.name}..."):
|
207 |
+
pdf_text = get_pdf_text([uploaded_file])
|
208 |
+
text_chunks = get_text_chunks(pdf_text)
|
209 |
+
get_vector_store(text_chunks)
|
210 |
+
st.success("Document processed successfully!")
|
211 |
+
|
212 |
+
# Add a fun fact or quote
|
213 |
+
facts = [
|
214 |
+
"Did you know? The first computer programmer was a woman named Ada Lovelace.",
|
215 |
+
"Fun fact: The term 'bug' in computer science originated from an actual moth found in a computer.",
|
216 |
+
"Quote: 'The science of today is the technology of tomorrow.' - Edward Teller"
|
217 |
+
]
|
218 |
+
st.info(random.choice(facts))
|
219 |
+
|
220 |
+
with col2:
|
221 |
+
st.header(f"Chat about {selected_section}")
|
222 |
+
|
223 |
+
if uploaded_file:
|
224 |
+
# Initialize chat history for the selected section if it doesn't exist
|
225 |
+
if selected_section not in st.session_state.chat_history:
|
226 |
+
st.session_state.chat_history[selected_section] = {"messages": []}
|
227 |
+
|
228 |
+
# Display chat history
|
229 |
+
for message in st.session_state.chat_history[selected_section]["messages"]:
|
230 |
+
with st.chat_message("user" if message["is_user"] else "assistant"):
|
231 |
+
st.write(message["text"])
|
232 |
+
|
233 |
+
# User input
|
234 |
+
user_question = st.chat_input("Ask a question about the document:")
|
235 |
+
if user_question:
|
236 |
+
st.session_state.chat_history[selected_section]["messages"].append({"is_user": True, "text": user_question})
|
237 |
+
|
238 |
+
with st.chat_message("user"):
|
239 |
+
st.write(user_question)
|
240 |
+
|
241 |
+
with st.chat_message("assistant"):
|
242 |
+
with st.spinner("Thinking..."):
|
243 |
+
response = process_user_input(user_question)
|
244 |
+
st.write(response)
|
245 |
+
|
246 |
+
st.session_state.chat_history[selected_section]["messages"].append({"is_user": False, "text": response})
|
247 |
+
|
248 |
+
# Clear chat button
|
249 |
+
if st.button("Clear Chat"):
|
250 |
+
st.session_state.chat_history[selected_section]["messages"] = []
|
251 |
+
st.rerun()
|
252 |
+
|
253 |
+
# Add a feature to download chat history
|
254 |
+
if st.button("Download Chat History"):
|
255 |
+
chat_history = "\n".join([f"{'User' if msg['is_user'] else 'AI'}: {msg['text']}" for msg in st.session_state.chat_history[selected_section]["messages"]])
|
256 |
+
st.download_button(
|
257 |
+
label="Download",
|
258 |
+
data=chat_history,
|
259 |
+
file_name=f"{selected_section}_chat_history.txt",
|
260 |
+
mime="text/plain"
|
261 |
+
)
|
262 |
+
|
263 |
+
else:
|
264 |
+
st.info("Please upload a PDF document to start chatting.")
|
265 |
+
|
266 |
+
# Add a feedback section
|
267 |
+
st.subheader("Feedback")
|
268 |
+
feedback = st.text_area("We'd love to hear your thoughts! Please leave your feedback here:")
|
269 |
+
if st.button("Submit Feedback"):
|
270 |
+
# Here you would typically save this feedback to a database
|
271 |
+
st.success("Thank you for your feedback!")
|
272 |
+
|
273 |
+
# Footer
|
274 |
+
st.markdown("---")
|
275 |
+
st.markdown("Created with ❤️ by Harshit S | © 2024 PDF Reader App")
|
276 |
+
|
277 |
+
def main():
|
278 |
+
st.title("Choose the suitable category:")
|
279 |
+
|
280 |
+
if "logged_in" not in st.session_state:
|
281 |
+
st.session_state.logged_in = False
|
282 |
+
|
283 |
+
if st.session_state.logged_in:
|
284 |
+
marketplace(st.session_state.username)
|
285 |
+
else:
|
286 |
+
menu = ["Login", "Sign Up"]
|
287 |
+
choice = st.sidebar.selectbox("Menu", menu)
|
288 |
+
|
289 |
+
if choice == "Login":
|
290 |
+
call = login()
|
291 |
+
if call:
|
292 |
+
main()
|
293 |
+
elif choice == "Sign Up":
|
294 |
+
signup()
|
295 |
+
|
296 |
+
if __name__ == "__main__":
|
297 |
+
print(peek())
|
298 |
+
main()
|
299 |
+
|
300 |
+
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import sqlite3
|
2 |
+
|
3 |
+
|
4 |
+
def create_connection():
|
5 |
+
conn = sqlite3.connect("user_data.db")
|
6 |
+
return conn
|
7 |
+
|
8 |
+
|
9 |
+
def create_table():
|
10 |
+
conn = create_connection()
|
11 |
+
cursor = conn.cursor()
|
12 |
+
cursor.execute("""
|
13 |
+
CREATE TABLE IF NOT EXISTS users(
|
14 |
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
15 |
+
username TEXT UNIQUE NOT NULL,
|
16 |
+
password TEXT NOT NULL
|
17 |
+
)
|
18 |
+
""")
|
19 |
+
|
20 |
+
conn.commit()
|
21 |
+
conn.close()
|
22 |
+
|
23 |
+
|
24 |
+
def add_user(username, password):
|
25 |
+
conn = create_connection()
|
26 |
+
cursor = conn.cursor()
|
27 |
+
cursor.execute("""
|
28 |
+
INSERT INTO users(username, password) VALUES (?, ?)
|
29 |
+
""", (username, password))
|
30 |
+
conn.commit()
|
31 |
+
conn.close()
|
32 |
+
|
33 |
+
|
34 |
+
def peek():
|
35 |
+
conn = create_connection()
|
36 |
+
cursor = conn.cursor()
|
37 |
+
cursor.execute("""
|
38 |
+
SELECT * FROM users
|
39 |
+
""")
|
40 |
+
users = cursor.fetchall()
|
41 |
+
for user in users:
|
42 |
+
print(user)
|
43 |
+
conn.close()
|
44 |
+
|
45 |
+
|
46 |
+
def verify_user(username, password):
|
47 |
+
conn = create_connection()
|
48 |
+
cursor = conn.cursor()
|
49 |
+
cursor.execute("""
|
50 |
+
SELECT * FROM users
|
51 |
+
WHERE username = ? AND password = ?
|
52 |
+
""", (username, password))
|
53 |
+
user = cursor.fetchone()
|
54 |
+
conn.close()
|
55 |
+
return user
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
chromadb
|
2 |
+
faiss-cpu
|
3 |
+
google-generativeai
|
4 |
+
langchain
|
5 |
+
langchain_community
|
6 |
+
langchain_google_genai
|
7 |
+
Pillow
|
8 |
+
pypdf
|
9 |
+
pyPDF2
|
10 |
+
python-dotenv
|
11 |
+
streamlit
|
12 |
+
streamlit_lottie
|
13 |
+
requests
|
Binary file (16.4 kB). View file
|
|