# Hugging Face Implementation Plan

## Overview
This document outlines the plan to rebuild the RAG system using Hugging Face's models and capabilities instead of Google Cloud services, while preserving the original cloud implementation as a separate option.

## Repository Links
- GitHub: https://github.com/Daanworg/cloud-rag-webhook
- Hugging Face Space: https://huggingface.co/spaces/Ultronprime/cloud-rag-webhook

## Migration Strategy
The key difference in our approach is to **replace all Google Cloud dependencies with Hugging Face models and tools**:

1. **Replace Google's DocumentAI** → Use Hugging Face OCR models (like `microsoft/layoutlm-base-uncased`) 
2. **Replace Vertex AI** → Use Hugging Face embeddings models (like `sentence-transformers/all-MiniLM-L6-v2`)
3. **Replace BigQuery** → Use FAISS/Chroma vector store with local storage or Hugging Face Datasets
4. **Replace Cloud Storage** → Use Hugging Face's persistent storage
5. **Replace Cloud Run** → Use Hugging Face Spaces continuous execution

## Implementation Steps

1. **Set Up New Architecture**:
   - Create a revised Dockerfile for Hugging Face
   - Set up persistent storage (20GB purchased)
   - Configure A100 GPU using `accelerate` for pro users

2. **Replace Text Processing Pipeline**:
   - Create a new OCR module using Transformers document models
   - Implement a chunking system using pure Python
   - Add text cleaning and processing without DocumentAI

3. **Replace Vector Database**:
   - Implement FAISS/Chroma for vector storage
   - Use Hugging Face Datasets for persistent indexed storage
   - Create migration utility to move data from BigQuery

4. **Replace Embedding System**:
   - Use `sentence-transformers` models for embeddings
   - Implement similarity search using FAISS/Chroma
   - Create a compatible API to replace Vertex AI functions

5. **Update Application Layer**:
   - Modify Flask app to run on Hugging Face
   - Update file handling to use local storage
   - Create model caching for better performance

## Key Components

1. **Text Processing**:
```python
# New approach using Hugging Face models
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import Dataset

def process_text(text_content):
    """Process text using Hugging Face models."""
    tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
    model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
    
    # Process and chunk the text
    chunks = chunk_text(text_content)
    
    # Store in persistent dataset
    dataset = Dataset.from_dict({"text": chunks})
    dataset.save_to_disk("./data/chunks")
    
    return dataset
```

2. **Vector Storage**:
```python
# New approach using FAISS
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer

class FAISSVectorStore:
    def __init__(self):
        self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
        self.dimension = self.model.get_sentence_embedding_dimension()
        self.index = faiss.IndexFlatL2(self.dimension)
        self.texts = []
        
    def add_texts(self, texts):
        embeddings = self.model.encode(texts)
        self.index.add(np.array(embeddings, dtype=np.float32))
        self.texts.extend(texts)
        
    def search(self, query, k=5):
        query_embedding = self.model.encode([query])[0]
        distances, indices = self.index.search(
            np.array([query_embedding], dtype=np.float32), k
        )
        return [self.texts[i] for i in indices[0]]
```

3. **Hugging Face Space Configuration**:
```yaml
title: RAG Document Processing
emoji: 📄
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
models:
  - sentence-transformers/all-MiniLM-L6-v2
  - facebook/bart-large-cnn
license: apache-2.0
```

## Automation Plan

1. **Background Processing**:
   - Implement a file watcher for the persistent storage directory
   - Process files automatically when added to upload directory
   - Use Gradio/Streamlit for UI with background task system

2. **Scheduled Tasks**:
   - Use Hugging Face Space's GitHub Actions for scheduling
   - Run index maintenance tasks periodically
   - Implement file processing queue for batch operations

3. **GitHub Integration**:
   - Push processed data to GitHub repository as backup
   - Use GitHub to store model configuration
   - Implement version control for processed data

## Required Libraries
```
transformers==4.40.0
datasets==2.17.1
sentence-transformers==2.3.1
faiss-cpu==1.7.4  # or faiss-gpu for CUDA support
gradio==4.19.2
streamlit==1.32.0
langchain==0.1.5
torch==2.1.2
accelerate==0.28.0
```

## Hardware Requirements
- Use Hugging Face Pro's free A100 tier (zero.gpu)
- Configure model inference for optimal performance on GPU
- Set up model caching to reduce memory usage
- Utilize Hugging Face's persistent storage (20GB)

## Project Goals
Create a fully self-contained RAG system on Hugging Face:
1. Process text files automatically
2. Generate embeddings with Hugging Face models
3. Store vectors in FAISS/Chroma on persistent storage
4. Query the data with a simple API
5. Run continuously "under the hood"
6. Utilize Hugging Face Pro benefits (A100 GPU, persistent storage)

## Implementation Files
We'll create the following new files to implement the Hugging Face version:

1. `hf_process_text.py` - Text processing with HF models
2. `hf_embeddings.py` - Embedding generation with sentence-transformers
3. `hf_vector_store.py` - FAISS/Chroma implementation
4. `hf_app.py` - Gradio/Streamlit interface
5. `hf_rag_query.py` - Query interface for HF models
6. `requirements_hf.txt` - HF-specific dependencies

This will allow us to maintain both implementations in parallel.