# Hugging Face Implementation Plan ## Overview This document outlines the plan to rebuild the RAG system using Hugging Face's models and capabilities instead of Google Cloud services, while preserving the original cloud implementation as a separate option. ## Repository Links - GitHub: https://github.com/Daanworg/cloud-rag-webhook - Hugging Face Space: https://huggingface.co/spaces/Ultronprime/cloud-rag-webhook ## Migration Strategy The key difference in our approach is to **replace all Google Cloud dependencies with Hugging Face models and tools**: 1. **Replace Google's DocumentAI** → Use Hugging Face OCR models (like `microsoft/layoutlm-base-uncased`) 2. **Replace Vertex AI** → Use Hugging Face embeddings models (like `sentence-transformers/all-MiniLM-L6-v2`) 3. **Replace BigQuery** → Use FAISS/Chroma vector store with local storage or Hugging Face Datasets 4. **Replace Cloud Storage** → Use Hugging Face's persistent storage 5. **Replace Cloud Run** → Use Hugging Face Spaces continuous execution ## Implementation Steps 1. **Set Up New Architecture**: - Create a revised Dockerfile for Hugging Face - Set up persistent storage (20GB purchased) - Configure A100 GPU using `accelerate` for pro users 2. **Replace Text Processing Pipeline**: - Create a new OCR module using Transformers document models - Implement a chunking system using pure Python - Add text cleaning and processing without DocumentAI 3. **Replace Vector Database**: - Implement FAISS/Chroma for vector storage - Use Hugging Face Datasets for persistent indexed storage - Create migration utility to move data from BigQuery 4. **Replace Embedding System**: - Use `sentence-transformers` models for embeddings - Implement similarity search using FAISS/Chroma - Create a compatible API to replace Vertex AI functions 5. **Update Application Layer**: - Modify Flask app to run on Hugging Face - Update file handling to use local storage - Create model caching for better performance ## Key Components 1. **Text Processing**: ```python # New approach using Hugging Face models from transformers import AutoTokenizer, AutoModelForSequenceClassification from datasets import Dataset def process_text(text_content): """Process text using Hugging Face models.""" tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") # Process and chunk the text chunks = chunk_text(text_content) # Store in persistent dataset dataset = Dataset.from_dict({"text": chunks}) dataset.save_to_disk("./data/chunks") return dataset ``` 2. **Vector Storage**: ```python # New approach using FAISS import faiss import numpy as np from sentence_transformers import SentenceTransformer class FAISSVectorStore: def __init__(self): self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') self.dimension = self.model.get_sentence_embedding_dimension() self.index = faiss.IndexFlatL2(self.dimension) self.texts = [] def add_texts(self, texts): embeddings = self.model.encode(texts) self.index.add(np.array(embeddings, dtype=np.float32)) self.texts.extend(texts) def search(self, query, k=5): query_embedding = self.model.encode([query])[0] distances, indices = self.index.search( np.array([query_embedding], dtype=np.float32), k ) return [self.texts[i] for i in indices[0]] ``` 3. **Hugging Face Space Configuration**: ```yaml title: RAG Document Processing emoji: 📄 colorFrom: blue colorTo: green sdk: docker app_port: 7860 pinned: false models: - sentence-transformers/all-MiniLM-L6-v2 - facebook/bart-large-cnn license: apache-2.0 ``` ## Automation Plan 1. **Background Processing**: - Implement a file watcher for the persistent storage directory - Process files automatically when added to upload directory - Use Gradio/Streamlit for UI with background task system 2. **Scheduled Tasks**: - Use Hugging Face Space's GitHub Actions for scheduling - Run index maintenance tasks periodically - Implement file processing queue for batch operations 3. **GitHub Integration**: - Push processed data to GitHub repository as backup - Use GitHub to store model configuration - Implement version control for processed data ## Required Libraries ``` transformers==4.40.0 datasets==2.17.1 sentence-transformers==2.3.1 faiss-cpu==1.7.4 # or faiss-gpu for CUDA support gradio==4.19.2 streamlit==1.32.0 langchain==0.1.5 torch==2.1.2 accelerate==0.28.0 ``` ## Hardware Requirements - Use Hugging Face Pro's free A100 tier (zero.gpu) - Configure model inference for optimal performance on GPU - Set up model caching to reduce memory usage - Utilize Hugging Face's persistent storage (20GB) ## Project Goals Create a fully self-contained RAG system on Hugging Face: 1. Process text files automatically 2. Generate embeddings with Hugging Face models 3. Store vectors in FAISS/Chroma on persistent storage 4. Query the data with a simple API 5. Run continuously "under the hood" 6. Utilize Hugging Face Pro benefits (A100 GPU, persistent storage) ## Implementation Files We'll create the following new files to implement the Hugging Face version: 1. `hf_process_text.py` - Text processing with HF models 2. `hf_embeddings.py` - Embedding generation with sentence-transformers 3. `hf_vector_store.py` - FAISS/Chroma implementation 4. `hf_app.py` - Gradio/Streamlit interface 5. `hf_rag_query.py` - Query interface for HF models 6. `requirements_hf.txt` - HF-specific dependencies This will allow us to maintain both implementations in parallel.