langchain fitz tesseract frontend langchain-community pdfminer.six sentence-transformers chromadb python-docx docx2txt PyMuPDF streamlit boto3 gradio gradio langchain langchain-community huggingface_hub pymupdf pytesseract pillow numpy boto3 sentence-transformers chromadb