talexm commited on
Commit
e4a2031
·
1 Parent(s): c3c1187
Files changed (1) hide show
  1. rag_sec/README.md +44 -17
rag_sec/README.md CHANGED
@@ -1,24 +1,51 @@
 
1
 
2
- # RAG-Chagu Test Suite
3
 
4
- ## Overview
5
- This project demonstrates a RAG system enhanced with Chagu features for:
6
- - Data Poisoning Detection
7
- - Model Drift Handling
8
- - Query Injection Attack Prevention
9
- - Adversarial Embedding Detection
10
 
11
- ## Setup
12
 
13
- ### Install Dependencies
14
- ```bash
15
- pip install -r requirements.txt
16
- ```
17
 
18
- ### Run the Test Suite
19
- ```bash
20
- python rag_chagu_demo.py
21
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ## Requirements
24
- - Python 3.8 or higher
 
 
 
 
 
 
1
+ # Document Search and Response Generation System
2
 
3
+ This project implements a **Document Search and Response Generation System** combining semantic search, malicious query detection, and generative response capabilities. It is designed for efficient and context-aware information retrieval and response generation.
4
 
5
+ ---
 
 
 
 
 
6
 
7
+ ## Features
8
 
9
+ 1. **Semantic Search**:
10
+ - Uses SentenceTransformer embeddings for document similarity.
11
+ - Retrieves top-k relevant documents for a given query.
 
12
 
13
+ 2. **Malicious Query Detection**:
14
+ - Identifies and blocks malicious or harmful queries using sentiment analysis.
15
+
16
+ 3. **Query Transformation**:
17
+ - Rephrases or enhances ambiguous queries for better processing.
18
+
19
+ 4. **Generative Response**:
20
+ - Generates a context-aware response using Hugging Face models like `distilgpt2`.
21
+
22
+ 5. **Expandable Architecture**:
23
+ - Modular components for easy enhancement and integration.
24
+ - Compatible with lightweight and resource-efficient models.
25
+
26
+ ---
27
+
28
+ ## Architecture
29
+
30
+ 1. **Bad Query Detector**:
31
+ - Detects malicious or inappropriate queries using sentiment analysis (`distilbert-base-uncased-finetuned-sst-2-english`).
32
+
33
+ 2. **Query Transformer**:
34
+ - Rephrases or improves queries for better retrieval results.
35
+
36
+ 3. **Document Retriever**:
37
+ - Encodes documents into dense vectors using `all-MiniLM-L6-v2` embeddings.
38
+ - Finds similar documents using cosine similarity.
39
+
40
+ 4. **Semantic Response Generator**:
41
+ - Generates context-aware responses using models like `distilgpt2` or `EleutherAI/gpt-neo-1.3B`.
42
+
43
+ ---
44
 
45
  ## Requirements
46
+
47
+ ### Python Libraries
48
+ Install the necessary libraries using `pip`:
49
+ ```bash
50
+ pip install transformers sentence-transformers scikit-learn numpy flask
51
+ ```