soyuj commited on
Commit
d104f07
1 Parent(s): b73d7a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -3
README.md CHANGED
@@ -1,3 +1,46 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - information retrieval
8
+ - llama2
9
+ - document expansion
10
+ - LoRA
11
+ ---
12
+
13
+ This repository contains the LoRA weights for fine-tuning pre-trained Llama 2 7B for document expansion for use with [DeeperImpact](https://arxiv.org/abs/2405.17093).
14
+
15
+ We use the same dataset as DocT5Query for fine-tuning the pre-trained Llama 2 model i.e. 532k document-query pairs from MSMARCO Passage Qrels Train Dataset.
16
+
17
+ Please refer to the following GitHub repository to learn how to use it for document expansion: [inference_deeper_impact.ipynb](https://github.com/basnetsoyuj/improving-learned-index/blob/master/inference_deeper_impact.ipynb)
18
+
19
+ You can also clone the [DeeperImpact repo](https://github.com/basnetsoyuj/improving-learned-index/blob/master) and run expansions on a collection of documents using the following command:
20
+
21
+ ```
22
+ python -m src.llama2.generate \
23
+ --llama_path <path | HuggingFaceHub link> \
24
+ --collection_path <path> \
25
+ --collection_type [msmarco | beir] \
26
+ --output_path <path> \
27
+ --batch_size <batch_size> \
28
+ --max_tokens 512 \
29
+ --num_return_sequences 80 \
30
+ --max_new_tokens 50 \
31
+ --top_k 50 \
32
+ --top_p 0.95 \
33
+ --peft_path soyuj/llama2-doc2query
34
+ ```
35
+
36
+ This will generate a jsonl file with expansions for each document in the collection. To append the unique expansion terms to the original collection, use the following command:
37
+
38
+ ```
39
+ python -m src.llama2.merge \
40
+ --collection_path <path> \
41
+ --collection_type [msmarco | beir] \
42
+ --queries_path <jsonl file generated above> \
43
+ --output_path <path>
44
+ ```
45
+
46
+