Shreyas094 commited on
Commit
02ed408
1 Parent(s): d52f90a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -1
README.md CHANGED
@@ -10,4 +10,66 @@ pinned: false
10
  license: apache-2.0
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: apache-2.0
11
  ---
12
 
13
+ # AI-powered Web Search and PDF Chat Assistant
14
+
15
+ This project combines the power of large language models with web search capabilities and PDF document analysis to create a versatile chat assistant. Users can interact with their uploaded PDF documents or leverage web search to get informative responses to their queries.
16
+
17
+ ## Features
18
+
19
+ - **PDF Document Chat**: Upload and interact with multiple PDF documents.
20
+ - **Web Search Integration**: Option to use web search for answering queries.
21
+ - **Multiple AI Models**: Choose from a selection of powerful language models.
22
+ - **Customizable Responses**: Adjust temperature and API call settings for fine-tuned outputs.
23
+ - **User-friendly Interface**: Built with Gradio for an intuitive chat experience.
24
+ - **Document Selection**: Choose which uploaded documents to include in your queries.
25
+
26
+ ## How It Works
27
+
28
+ 1. **Document Processing**:
29
+ - Upload PDF documents using either PyPDF or LlamaParse.
30
+ - Documents are processed and stored in a FAISS vector database for efficient retrieval.
31
+
32
+ 2. **Embedding**:
33
+ - Utilizes HuggingFace embeddings (default: 'sentence-transformers/all-mpnet-base-v2') for document indexing and query matching.
34
+
35
+ 3. **Query Processing**:
36
+ - For PDF queries, relevant document sections are retrieved from the FAISS database.
37
+ - For web searches, results are fetched using the DuckDuckGo search API.
38
+
39
+ 4. **Response Generation**:
40
+ - Queries are processed using the selected AI model (options include Mistral, Mixtral, and others).
41
+ - Responses are generated based on the retrieved context (from PDFs or web search).
42
+
43
+ 5. **User Interaction**:
44
+ - Users can chat with the AI, asking questions about uploaded documents or general queries.
45
+ - The interface allows for adjusting model parameters and switching between PDF and web search modes.
46
+
47
+ ## Setup and Usage
48
+
49
+ 1. Install the required dependencies (list of dependencies to be added).
50
+ 2. Set up the necessary API keys and tokens in your environment variables.
51
+ 3. Run the main script to launch the Gradio interface.
52
+ 4. Upload PDF documents using the file input at the top of the interface.
53
+ 5. Select documents to query using the checkboxes.
54
+ 6. Toggle between PDF chat and web search modes as needed.
55
+ 7. Adjust temperature and number of API calls to fine-tune responses.
56
+ 8. Start chatting and asking questions!
57
+
58
+ ## Models
59
+
60
+ The project supports multiple AI models, including:
61
+ - mistralai/Mistral-7B-Instruct-v0.3
62
+ - mistralai/Mixtral-8x7B-Instruct-v0.1
63
+ - meta/llama-3.1-8b-instruct
64
+ - mistralai/Mistral-Nemo-Instruct-2407
65
+
66
+ ## Future Improvements
67
+
68
+ - Integration of more embedding models for improved performance.
69
+ - Enhanced PDF parsing capabilities.
70
+ - Support for additional file formats beyond PDF.
71
+ - Improved caching for faster response times.
72
+
73
+ ## Contribution
74
+
75
+ Contributions to this project are welcome! Please feel free to submit issues or pull requests on the project's GitHub repository.