easyrec-roberta-base / README.md

Rbin

Update README.md

90c85e5 verified about 1 month ago

preview code

raw

history blame contribute delete

No virus

3.55 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- recommendation
	- collaborative filtering
	---

	# EasyRec-Base

	## Overview

	- Description: EasyRec is a series of language models designed for recommendations, trained to match the textual profiles of users and items with collaborative signals.
	- Usage: You can use EasyRec to encode user and item text embeddings based on the textual profiles that reflect their preferences for various recommendation scenarios.
	- Evaluation: We evaluate the performance of EasyRec in: (i) Text-based zero-shot recommendation and (ii) Text-enhanced collaborative filtering.
	- Finetuned from model: EasyRec is finetuned from [RoBERTa](https://huggingface.co/FacebookAI/roberta-large) within English.

	For details please refer to our [💻[GitHub Code](https://github.com/HKUDS/EasyRec)] and [📖[Paper](https://arxiv.org/abs/2408.08821)].

	## Get Started

	### Environment

	Please run the following commands to create a conda environment:

	```bash
	conda create -y -n easyrec python=3.11
	pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1
	pip install -U "transformers==4.40.0" --upgrade
	pip install accelerate==0.28.0
	pip install tqdm
	pip install sentencepiece==0.2.0
	pip install scipy==1.9.3
	pip install setproctitle
	pip install sentence_transformers
	```

	### Example Codes
	Please first download the codes.
	```ssh
	git clone https://github.com/HKUDS/EasyRec.git
	cd EasyRec
	```

	Here is an example code snippet to utilize EasyRec for encoding text embeddings based on user and item profiles for recommendations.

	```Python
	import torch
	from model import Easyrec
	import torch.nn.functional as F
	from transformers import AutoConfig, AutoModel, AutoTokenizer

	config = AutoConfig.from_pretrained("hkuds/easyrec-roberta-large")
	model = Easyrec.from_pretrained("hkuds/easyrec-roberta-large", config=config,)
	tokenizer = AutoTokenizer.from_pretrained("hkuds/easyrec-roberta-large", use_fast=False,)

	profiles = [
	'This user is a basketball fan and likes to play basketball and watch NBA games.', # user
	'This basketball draws in NBA enthusiasts.', # item 1
	'This item is nice for swimming lovers.' # item 2
	]

	inputs = tokenizer(profiles, padding=True, truncation=True, max_length=512, return_tensors="pt")
	with torch.inference_mode():
	embeddings = model.encode(input_ids=inputs.input_ids, attention_mask=inputs.attention_mask)
	embeddings = F.normalize(embeddings.pooler_output.detach().float(), dim=-1)

	print(embeddings[0] @ embeddings[1]) # 0.8971
	print(embeddings[0] @ embeddings[2]) # 0.2904
	```

	### Model List
	We release a series of EasyRec checkpoints with varying sizes. You can easily load these models from Hugging Face by replacing the model name.
	\| Model \| Model Size \| Recall@20 on Amazon-Sports \|
	\|:-------------------------------\|:--------:\| :--------:\|
	\| [hkuds/easyrec-roberta-small](https://huggingface.co/hkuds/easyrec-roberta-small) \| 82M \| 0.0286 \|
	\| [hkuds/easyrec-roberta-base](https://huggingface.co/hkuds/easyrec-roberta-base) \| 125M \| 0.0518 \|
	\| [hkuds/easyrec-roberta-large](https://huggingface.co/hkuds/easyrec-roberta-large) \| 355M \| 0.0557 \|

	## 🌟 Citation
	If you find this work is helpful to your research, please consider citing our paper:
	```bibtex
	@article{ren2024easyrec,
	title={EasyRec: Simple yet Effective Language Models for Recommendation},
	author={Ren, Xubin and Huang, Chao},
	journal={arXiv preprint arXiv:2408.08821},
	year={2024}
	}
	```
	Thanks for your interest in our work!