Clelia (Astra) Bertelli's picture

Clelia (Astra) Bertelli PRO

as-cle-bert

AI & ML interests

Biology + Artificial Intelligence = โค๏ธ | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source

Recent Activity

Articles

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture GreenFit AI's profile picture

as-cle-bert's activity

replied to their post 4 days ago
view reply

Hi!

I generally use LangChain + PyPDF, I leave here a code snippet:

from langchain_community.document_loaders import PyPDFLoader

def preprocess(pdf: str) -> list:
    """
    Uses LangChain's PyPDFLoader to extract text.
    """
    loader = PyPDFLoader(pdf)
    documents = loader.load()
    for doc in documents:
        print(doc.page_content)    

This should give a more solid result :)

PS: Langchain is distributed under an MIT license, see their GitHub (https://github.com/langchain-ai/langchain)

posted an update 5 days ago
view post
Post
1493
๐Ÿš€๐๐ž๐ฐ ๐๐ž๐ฆ๐จ ๐š๐ฅ๐ž๐ซ๐ญ๐Ÿš€

Convert (almost) everything to PDF with ๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง, now on Spaces! ๐Ÿ‘‰ as-cle-bert/pdfitdown

You can also install it locally:

python3 -m pip install pdfitdown


Don't forget to star it on GitHub, if you find it useful! ๐Ÿ‘‰ https://www.github.com/AstraBert/PdfItDown

  • 3 replies
ยท
posted an update 9 days ago
view post
Post
509
Hi HuggingFace Community๐Ÿค—, I am thrilled to announce:

๐ช๐๐ฎ๐ซ๐ฅ๐ฅ๐ฆ ๐šŸ๐Ÿท-๐š›๐šŒ.๐Ÿท (https://github.com/AstraBert/qdurllm/tree/january-2025)

Qdurllm (๐—ค๐—ฑrant, ๐—จ๐—ฅLs, ๐—Ÿarge ๐—Ÿanguage ๐— odels) is a local Gradio (Gradio) application that lets you upload you web content to a local Qdrant (Qdrant) database and search through it or chat with it.

The ๐—ป๐—ฒ๐˜„ ๐—ฝ๐—ฟ๐—ฒ-๐—ฟ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ (https://github.com/AstraBert/qdurllm/releases/tag/v1.0.0-rc.0) implements ๐˜€๐—ฝ๐—ฎ๐—ฟ๐˜€๐—ฒ ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต (with prithivida/Splade_PP_en_v1) + ๐—ฟ๐—ฒ๐—ฟ๐—ฎ๐—ป๐—ธ๐—ถ๐—ป๐—ด (with nomic-ai/modernbert-embed-base by Hugging Face + Nomic AI) and ๐˜€๐—ฒ๐—บ๐—ฎ๐—ป๐˜๐—ถ๐—ฐ ๐—ฐ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ด (based on Qdrant) and switched ๐—ณ๐—ฟ๐—ผ๐—บ google/gemma-2-2b-it ๐˜๐—ผ Qwen/Qwen2.5-1.5B-Instruct to conform to the SOTA landscape and to finally make the application based ๐—ผ๐—ป๐—น๐˜† ๐—ผ๐—ป ๐˜๐—ฟ๐˜‚๐—น๐˜† ๐—ผ๐—ฝ๐—ฒ๐—ป ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€.

The pre-release is ๐—ฎ๐˜ƒ๐—ฎ๐—ถ๐—น๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ณ๐—ผ๐—ฟ ๐˜๐—ฒ๐˜€๐˜๐—ถ๐—ป๐—ด and I would be really really happy if you wanted to give it a try and leave your feedback on the discussion thread on GitHub (https://github.com/AstraBert/qdurllm/discussions/8) or here on Hugging Face forum via comments under this postโœจ.
Find all the information to install and launch it here ๐Ÿ‘‰ https://astrabert.github.io/qdurllm/#2-installation
replied to their post 14 days ago
view reply

Thank you so much for letting me know! This is indeed a very interesting role :)