PyPDF2 streamlit google-generativeai python-dotenv python-docx newspaper3k lxml_html_clean