soyuj commited on
Commit
4a62495
1 Parent(s): 3aa77e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -3
README.md CHANGED
@@ -1,3 +1,33 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - bert
8
+ - information retrieval
9
+ - learned sparse model
10
+ ---
11
+
12
+ Paper: [DeeperImpact: Optimizing Sparse Learned Index Structures](https://arxiv.org/abs/2405.17093)
13
+
14
+ This repository contains the DeeperImpact model trained on the MS-MARCO passage dataset expanded using a [fine-tuned Llama 2 model](https://huggingface.co/soyuj/llama2-doc2query)
15
+ with hard negatives, distillation, and pre-trained CoCondenser model initialization.
16
+
17
+ The code to train and run inferences using DeeperImpact can be found in the [DeeperImpact Repo](https://github.com/basnetsoyuj/improving-learned-index).
18
+
19
+ Please refer to the following notebook to understand how to use the model: [inference_deeper_impact.ipynb](https://github.com/basnetsoyuj/improving-learned-index/blob/master/inference_deeper_impact.ipynb)
20
+
21
+ For running inference on a larger collection of documents, use the following command:
22
+
23
+ ```bash
24
+ python -m src.deep_impact.index \
25
+ --collection_path <expanded_collection.tsv> \
26
+ --output_file_path <path> \
27
+ --model_checkpoint_path soyuj/deeper-impact \
28
+ --num_processes <n> \
29
+ --process_batch_size <process_batch_size> \
30
+ --model_batch_size <model_batch_size>
31
+ ```
32
+
33
+ It distributes the inference across multiple GPUs in the machine. To manually set the GPUs, use `CUDA_VISIBLE_DEVICES` environment variable.