Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.20.1
Hugging Face Implementation Plan
Overview
This document outlines the plan to rebuild the RAG system using Hugging Face's models and capabilities instead of Google Cloud services, while preserving the original cloud implementation as a separate option.
Repository Links
- GitHub: https://github.com/Daanworg/cloud-rag-webhook
- Hugging Face Space: https://huggingface.co/spaces/Ultronprime/cloud-rag-webhook
Migration Strategy
The key difference in our approach is to replace all Google Cloud dependencies with Hugging Face models and tools:
- Replace Google's DocumentAI β Use Hugging Face OCR models (like
microsoft/layoutlm-base-uncased
) - Replace Vertex AI β Use Hugging Face embeddings models (like
sentence-transformers/all-MiniLM-L6-v2
) - Replace BigQuery β Use FAISS/Chroma vector store with local storage or Hugging Face Datasets
- Replace Cloud Storage β Use Hugging Face's persistent storage
- Replace Cloud Run β Use Hugging Face Spaces continuous execution
Implementation Steps
Set Up New Architecture:
- Create a revised Dockerfile for Hugging Face
- Set up persistent storage (20GB purchased)
- Configure A100 GPU using
accelerate
for pro users
Replace Text Processing Pipeline:
- Create a new OCR module using Transformers document models
- Implement a chunking system using pure Python
- Add text cleaning and processing without DocumentAI
Replace Vector Database:
- Implement FAISS/Chroma for vector storage
- Use Hugging Face Datasets for persistent indexed storage
- Create migration utility to move data from BigQuery
Replace Embedding System:
- Use
sentence-transformers
models for embeddings - Implement similarity search using FAISS/Chroma
- Create a compatible API to replace Vertex AI functions
- Use
Update Application Layer:
- Modify Flask app to run on Hugging Face
- Update file handling to use local storage
- Create model caching for better performance
Key Components
- Text Processing:
# New approach using Hugging Face models
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import Dataset
def process_text(text_content):
"""Process text using Hugging Face models."""
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
# Process and chunk the text
chunks = chunk_text(text_content)
# Store in persistent dataset
dataset = Dataset.from_dict({"text": chunks})
dataset.save_to_disk("./data/chunks")
return dataset
- Vector Storage:
# New approach using FAISS
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
class FAISSVectorStore:
def __init__(self):
self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
self.dimension = self.model.get_sentence_embedding_dimension()
self.index = faiss.IndexFlatL2(self.dimension)
self.texts = []
def add_texts(self, texts):
embeddings = self.model.encode(texts)
self.index.add(np.array(embeddings, dtype=np.float32))
self.texts.extend(texts)
def search(self, query, k=5):
query_embedding = self.model.encode([query])[0]
distances, indices = self.index.search(
np.array([query_embedding], dtype=np.float32), k
)
return [self.texts[i] for i in indices[0]]
- Hugging Face Space Configuration:
title: RAG Document Processing
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
models:
- sentence-transformers/all-MiniLM-L6-v2
- facebook/bart-large-cnn
license: apache-2.0
Automation Plan
Background Processing:
- Implement a file watcher for the persistent storage directory
- Process files automatically when added to upload directory
- Use Gradio/Streamlit for UI with background task system
Scheduled Tasks:
- Use Hugging Face Space's GitHub Actions for scheduling
- Run index maintenance tasks periodically
- Implement file processing queue for batch operations
GitHub Integration:
- Push processed data to GitHub repository as backup
- Use GitHub to store model configuration
- Implement version control for processed data
Required Libraries
transformers==4.40.0
datasets==2.17.1
sentence-transformers==2.3.1
faiss-cpu==1.7.4 # or faiss-gpu for CUDA support
gradio==4.19.2
streamlit==1.32.0
langchain==0.1.5
torch==2.1.2
accelerate==0.28.0
Hardware Requirements
- Use Hugging Face Pro's free A100 tier (zero.gpu)
- Configure model inference for optimal performance on GPU
- Set up model caching to reduce memory usage
- Utilize Hugging Face's persistent storage (20GB)
Project Goals
Create a fully self-contained RAG system on Hugging Face:
- Process text files automatically
- Generate embeddings with Hugging Face models
- Store vectors in FAISS/Chroma on persistent storage
- Query the data with a simple API
- Run continuously "under the hood"
- Utilize Hugging Face Pro benefits (A100 GPU, persistent storage)
Implementation Files
We'll create the following new files to implement the Hugging Face version:
hf_process_text.py
- Text processing with HF modelshf_embeddings.py
- Embedding generation with sentence-transformershf_vector_store.py
- FAISS/Chroma implementationhf_app.py
- Gradio/Streamlit interfacehf_rag_query.py
- Query interface for HF modelsrequirements_hf.txt
- HF-specific dependencies
This will allow us to maintain both implementations in parallel.