Clelia (Astra) Bertelli PRO

as-cle-bert

https://www.cleliasportfolio.xyz

AI & ML interests

Recent Activity

new activity about 12 hours ago

greenfit-ai/synthetic-sport-products-sustainability:Librarian Bot: Add language metadata for dataset

liked a model 2 days ago

Alibaba-NLP/gte-modernbert-base

liked a dataset 2 days ago

greenfit-ai/synthetic-sport-products-sustainability

View all activity

Articles

Organizations

as-cle-bert's activity

New activity in greenfit-ai/synthetic-sport-products-sustainability about 12 hours ago

Librarian Bot: Add language metadata for dataset

#2 opened 1 day ago by

librarian-bot

liked a model 2 days ago

Alibaba-NLP/gte-modernbert-base

liked a dataset 2 days ago

greenfit-ai/synthetic-sport-products-sustainability

Viewer • Updated about 12 hours ago • 100 • 8 • 1

updated a Space 2 days ago

Running

📚

Pdfitdown

Convert (almost) everything to PDF!

updated a dataset 3 days ago

greenfit-ai/synthetic-sport-products-sustainability

Viewer • Updated about 12 hours ago • 100 • 8 • 1

published a dataset 3 days ago

greenfit-ai/synthetic-sport-products-sustainability

Viewer • Updated about 12 hours ago • 100 • 8 • 1

liked a Space 4 days ago

Running

376

🧬

Synthetic Data Generator

Build datasets using natural language

liked 2 models 4 days ago

Qdrant/bm25

nomic-ai/modernbert-embed-base

replied to their post 4 days ago

Hi!

I generally use LangChain + PyPDF, I leave here a code snippet:

from langchain_community.document_loaders import PyPDFLoader

def preprocess(pdf: str) -> list:
    """
    Uses LangChain's PyPDFLoader to extract text.
    """
    loader = PyPDFLoader(pdf)
    documents = loader.load()
    for doc in documents:
        print(doc.page_content)

This should give a more solid result :)

PS: Langchain is distributed under an MIT license, see their GitHub (https://github.com/langchain-ai/langchain)

liked 2 models 4 days ago

OrcaDB/cde-small-v1

Feature Extraction • Updated 22 days ago • 7.33k • 4

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Text Generation • Updated 1 day ago • 70.1k • • 346

posted an update 5 days ago

Post

1493

🚀𝐍𝐞𝐰 𝐝𝐞𝐦𝐨 𝐚𝐥𝐞𝐫𝐭🚀

Convert (almost) everything to PDF with 𝐏𝐝𝐟𝐈𝐭𝐃𝐨𝐰𝐧, now on Spaces! 👉 as-cle-bert/pdfitdown

You can also install it locally:

python3 -m pip install pdfitdown

Don't forget to star it on GitHub, if you find it useful! 👉 https://www.github.com/AstraBert/PdfItDown

3 replies

liked a Space 5 days ago

Running

📚

Pdfitdown

Convert (almost) everything to PDF!

published a Space 5 days ago

Running

📚

Pdfitdown

Convert (almost) everything to PDF!

updated a dataset 5 days ago

greenfit-ai/claude-reviewed-sport-sustainability-papers

Viewer • Updated 5 days ago • 24 • 39 • 1

liked a dataset 5 days ago

greenfit-ai/claude-reviewed-sport-sustainability-papers

Viewer • Updated 5 days ago • 24 • 39 • 1

published a dataset 5 days ago

greenfit-ai/claude-reviewed-sport-sustainability-papers

Viewer • Updated 5 days ago • 24 • 39 • 1

posted an update 9 days ago

Post

509

Hi HuggingFace Community🤗, I am thrilled to announce:

𝐪𝐝𝐮𝐫𝐥𝐥𝐦 𝚟𝟷-𝚛𝚌.𝟷 (https://github.com/AstraBert/qdurllm/tree/january-2025)

Qdurllm (𝗤𝗱rant, 𝗨𝗥Ls, 𝗟arge 𝗟anguage 𝗠odels) is a local Gradio (Gradio) application that lets you upload you web content to a local Qdrant (Qdrant) database and search through it or chat with it.

The 𝗻𝗲𝘄 𝗽𝗿𝗲-𝗿𝗲𝗹𝗲𝗮𝘀𝗲 (https://github.com/AstraBert/qdurllm/releases/tag/v1.0.0-rc.0) implements 𝘀𝗽𝗮𝗿𝘀𝗲 𝘀𝗲𝗮𝗿𝗰𝗵 (with prithivida/Splade_PP_en_v1) + 𝗿𝗲𝗿𝗮𝗻𝗸𝗶𝗻𝗴 (with nomic-ai/modernbert-embed-base by Hugging Face + Nomic AI) and 𝘀𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗰𝗮𝗰𝗵𝗶𝗻𝗴 (based on Qdrant) and switched 𝗳𝗿𝗼𝗺 google/gemma-2-2b-it 𝘁𝗼 Qwen/Qwen2.5-1.5B-Instruct to conform to the SOTA landscape and to finally make the application based 𝗼𝗻𝗹𝘆 𝗼𝗻 𝘁𝗿𝘂𝗹𝘆 𝗼𝗽𝗲𝗻 𝗺𝗼𝗱𝗲𝗹𝘀.

The pre-release is 𝗮𝘃𝗮𝗶𝗹𝗮𝗯𝗹𝗲 𝗳𝗼𝗿 𝘁𝗲𝘀𝘁𝗶𝗻𝗴 and I would be really really happy if you wanted to give it a try and leave your feedback on the discussion thread on GitHub (https://github.com/AstraBert/qdurllm/discussions/8) or here on Hugging Face forum via comments under this post✨.
Find all the information to install and launch it here 👉 https://astrabert.github.io/qdurllm/#2-installation

replied to their post 14 days ago

Thank you so much for letting me know! This is indeed a very interesting role :)

Clelia (Astra) Bertelli PRO

AI & ML interests

Recent Activity

Articles

Search the Web with AI

Debate Championship for LLMs

Building an AI-powered search engine from scratch

streamlit_supabase_auth_ui

AI is turning nuclear: a review

Is AI carbon footprint worrisome?

_Repetita iuvant_: how to improve AI code generation

BrAIn: next generation neurons?

What is going on with AlphaFold3?

Organizations

as-cle-bert's activity

Librarian Bot: Add language metadata for dataset

Pdfitdown

Synthetic Data Generator

Pdfitdown

Pdfitdown