cloud-rag-webhook / CLAUDE_HF.md
Ultronprime's picture
Upload CLAUDE_HF.md with huggingface_hub
b1a2e15 verified

A newer version of the Gradio SDK is available: 5.20.1

Upgrade

Hugging Face Implementation Plan

Overview

This document outlines the plan to rebuild the RAG system using Hugging Face's models and capabilities instead of Google Cloud services, while preserving the original cloud implementation as a separate option.

Repository Links

Migration Strategy

The key difference in our approach is to replace all Google Cloud dependencies with Hugging Face models and tools:

  1. Replace Google's DocumentAI β†’ Use Hugging Face OCR models (like microsoft/layoutlm-base-uncased)
  2. Replace Vertex AI β†’ Use Hugging Face embeddings models (like sentence-transformers/all-MiniLM-L6-v2)
  3. Replace BigQuery β†’ Use FAISS/Chroma vector store with local storage or Hugging Face Datasets
  4. Replace Cloud Storage β†’ Use Hugging Face's persistent storage
  5. Replace Cloud Run β†’ Use Hugging Face Spaces continuous execution

Implementation Steps

  1. Set Up New Architecture:

    • Create a revised Dockerfile for Hugging Face
    • Set up persistent storage (20GB purchased)
    • Configure A100 GPU using accelerate for pro users
  2. Replace Text Processing Pipeline:

    • Create a new OCR module using Transformers document models
    • Implement a chunking system using pure Python
    • Add text cleaning and processing without DocumentAI
  3. Replace Vector Database:

    • Implement FAISS/Chroma for vector storage
    • Use Hugging Face Datasets for persistent indexed storage
    • Create migration utility to move data from BigQuery
  4. Replace Embedding System:

    • Use sentence-transformers models for embeddings
    • Implement similarity search using FAISS/Chroma
    • Create a compatible API to replace Vertex AI functions
  5. Update Application Layer:

    • Modify Flask app to run on Hugging Face
    • Update file handling to use local storage
    • Create model caching for better performance

Key Components

  1. Text Processing:
# New approach using Hugging Face models
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import Dataset

def process_text(text_content):
    """Process text using Hugging Face models."""
    tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
    model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
    
    # Process and chunk the text
    chunks = chunk_text(text_content)
    
    # Store in persistent dataset
    dataset = Dataset.from_dict({"text": chunks})
    dataset.save_to_disk("./data/chunks")
    
    return dataset
  1. Vector Storage:
# New approach using FAISS
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer

class FAISSVectorStore:
    def __init__(self):
        self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
        self.dimension = self.model.get_sentence_embedding_dimension()
        self.index = faiss.IndexFlatL2(self.dimension)
        self.texts = []
        
    def add_texts(self, texts):
        embeddings = self.model.encode(texts)
        self.index.add(np.array(embeddings, dtype=np.float32))
        self.texts.extend(texts)
        
    def search(self, query, k=5):
        query_embedding = self.model.encode([query])[0]
        distances, indices = self.index.search(
            np.array([query_embedding], dtype=np.float32), k
        )
        return [self.texts[i] for i in indices[0]]
  1. Hugging Face Space Configuration:
title: RAG Document Processing
emoji: πŸ“„
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
models:
  - sentence-transformers/all-MiniLM-L6-v2
  - facebook/bart-large-cnn
license: apache-2.0

Automation Plan

  1. Background Processing:

    • Implement a file watcher for the persistent storage directory
    • Process files automatically when added to upload directory
    • Use Gradio/Streamlit for UI with background task system
  2. Scheduled Tasks:

    • Use Hugging Face Space's GitHub Actions for scheduling
    • Run index maintenance tasks periodically
    • Implement file processing queue for batch operations
  3. GitHub Integration:

    • Push processed data to GitHub repository as backup
    • Use GitHub to store model configuration
    • Implement version control for processed data

Required Libraries

transformers==4.40.0
datasets==2.17.1
sentence-transformers==2.3.1
faiss-cpu==1.7.4  # or faiss-gpu for CUDA support
gradio==4.19.2
streamlit==1.32.0
langchain==0.1.5
torch==2.1.2
accelerate==0.28.0

Hardware Requirements

  • Use Hugging Face Pro's free A100 tier (zero.gpu)
  • Configure model inference for optimal performance on GPU
  • Set up model caching to reduce memory usage
  • Utilize Hugging Face's persistent storage (20GB)

Project Goals

Create a fully self-contained RAG system on Hugging Face:

  1. Process text files automatically
  2. Generate embeddings with Hugging Face models
  3. Store vectors in FAISS/Chroma on persistent storage
  4. Query the data with a simple API
  5. Run continuously "under the hood"
  6. Utilize Hugging Face Pro benefits (A100 GPU, persistent storage)

Implementation Files

We'll create the following new files to implement the Hugging Face version:

  1. hf_process_text.py - Text processing with HF models
  2. hf_embeddings.py - Embedding generation with sentence-transformers
  3. hf_vector_store.py - FAISS/Chroma implementation
  4. hf_app.py - Gradio/Streamlit interface
  5. hf_rag_query.py - Query interface for HF models
  6. requirements_hf.txt - HF-specific dependencies

This will allow us to maintain both implementations in parallel.