File size: 1,149 Bytes
78350be
 
 
 
 
 
 
 
 
 
 
 
9dd5639
 
e6dc9f0
91855c2
9dd5639
 
7c8f585
9dd5639
e6dc9f0
9dd5639
 
e6dc9f0
9dd5639
 
e6dc9f0
9dd5639
 
 
 
e6dc9f0
9dd5639
 
 
 
 
e6dc9f0
9dd5639
 
e6dc9f0
9dd5639
 
 
 
e6dc9f0
9dd5639
 
e6dc9f0
 
 
9dd5639
 
4c9c87c
9dd5639
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
title: Books Semantic Search
emoji: 🦀
colorFrom: pink
colorTo: gray
sdk: gradio
sdk_version: 4.21.0
app_file: app.py
pinned: false
---


# Document Semantic Search

A simple Gradio interface for semantic search across multiple PDF documents using a combination of BM25 and vector embeddings to find relevant documents. The script builds a FAISS index on corpus of the uploaded documents, and first uses BM25 to find the top relevant results, then reranks them using cosine similarity to the search query.

## Setup

[Link to venv docs](https://docs.python.org/3/library/venv.html)

### Create environment

```shell
python3 -m venv venv
```

### To activate the environment

UNIX/MacOS:

```shell
source venv/bin/activate
```

Windows:

```shell
venv/Scripts/activate
```

### Install dependencies

If this is your first time running this or the package dependencies have changed, run this command to install all dependencies.

```shell
pip install -r requirements.txt
```

## Run

Run the app in reload mode with this command. This will let the app reload automatically when changes are made to the python script.

```shell
python app.py
```