soyuj
/

llama2-doc2query

information retrieval

document expansion

Inference Endpoints

Model card Files Files and versions Community

llama2-doc2query / README.md

soyuj's picture

Update README.md

d104f07 verified 4 months ago

|

history blame contribute delete

No virus

1.61 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	tags:
	- information retrieval
	- llama2
	- document expansion
	- LoRA
	---

	This repository contains the LoRA weights for fine-tuning pre-trained Llama 2 7B for document expansion for use with [DeeperImpact](https://arxiv.org/abs/2405.17093).

	We use the same dataset as DocT5Query for fine-tuning the pre-trained Llama 2 model i.e. 532k document-query pairs from MSMARCO Passage Qrels Train Dataset.

	Please refer to the following GitHub repository to learn how to use it for document expansion: [inference_deeper_impact.ipynb](https://github.com/basnetsoyuj/improving-learned-index/blob/master/inference_deeper_impact.ipynb)

	You can also clone the [DeeperImpact repo](https://github.com/basnetsoyuj/improving-learned-index/blob/master) and run expansions on a collection of documents using the following command:

	```
	python -m src.llama2.generate \
	--llama_path <path \| HuggingFaceHub link> \
	--collection_path <path> \
	--collection_type [msmarco \| beir] \
	--output_path <path> \
	--batch_size <batch_size> \
	--max_tokens 512 \
	--num_return_sequences 80 \
	--max_new_tokens 50 \
	--top_k 50 \
	--top_p 0.95 \
	--peft_path soyuj/llama2-doc2query
	```

	This will generate a jsonl file with expansions for each document in the collection. To append the unique expansion terms to the original collection, use the following command:

	```
	python -m src.llama2.merge \
	--collection_path <path> \
	--collection_type [msmarco \| beir] \
	--queries_path <jsonl file generated above> \
	--output_path <path>
	```