Search the Web with AI
โข
4
Hi!
I generally use LangChain + PyPDF, I leave here a code snippet:
from langchain_community.document_loaders import PyPDFLoader
def preprocess(pdf: str) -> list:
"""
Uses LangChain's PyPDFLoader to extract text.
"""
loader = PyPDFLoader(pdf)
documents = loader.load()
for doc in documents:
print(doc.page_content)
This should give a more solid result :)
PS: Langchain is distributed under an MIT license, see their GitHub (https://github.com/langchain-ai/langchain)
python3 -m pip install pdfitdown
Thank you so much for letting me know! This is indeed a very interesting role :)