metadata
license: apache-2.0
language:
- en
tags:
- recommendation
- collaborative filtering
EasyRec-Small
Overview
- Description: EasyRec is a series of language models designed for recommendations, trained to match the textual profiles of users and items with collaborative signals.
- Usage: You can use EasyRec to encode user and item text embeddings based on the textual profiles that reflect their preferences for various recommendation scenarios.
- Evaluation: We evaluate the performance of EasyRec in: (i) Text-based zero-shot recommendation and (ii) Text-enhanced collaborative filtering.
- Finetuned from model: EasyRec is finetuned from RoBERTa within English.
For details please refer to our [π»GitHub Code] and [πPaper].
Get Started
Environment
Please run the following commands to create a conda environment:
conda create -y -n easyrec python=3.11
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1
pip install -U "transformers==4.40.0" --upgrade
pip install accelerate==0.28.0
pip install tqdm
pip install sentencepiece==0.2.0
pip install scipy==1.9.3
pip install setproctitle
pip install sentence_transformers
Example Codes
Please first download the codes.
git clone https://github.com/HKUDS/EasyRec.git
cd EasyRec
Here is an example code snippet to utilize EasyRec for encoding text embeddings based on user and item profiles for recommendations.
import torch
from model import Easyrec
import torch.nn.functional as F
from transformers import AutoConfig, AutoModel, AutoTokenizer
config = AutoConfig.from_pretrained("hkuds/easyrec-roberta-small")
model = Easyrec.from_pretrained("hkuds/easyrec-roberta-small", config=config,)
tokenizer = AutoTokenizer.from_pretrained("hkuds/easyrec-roberta-small", use_fast=False,)
profiles = [
'This user is a basketball fan and likes to play basketball and watch NBA games.', # user
'This basketball draws in NBA enthusiasts.', # item 1
'This item is nice for swimming lovers.' # item 2
]
inputs = tokenizer(profiles, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.inference_mode():
embeddings = model.encode(input_ids=inputs.input_ids, attention_mask=inputs.attention_mask)
embeddings = F.normalize(embeddings.pooler_output.detach().float(), dim=-1)
print(embeddings[0] @ embeddings[1]) # 0.9260
print(embeddings[0] @ embeddings[2]) # 0.5834
Model List
We release a series of EasyRec checkpoints with varying sizes. You can easily load these models from Hugging Face by replacing the model name.
Model | Model Size | Recall@20 on Amazon-Sports |
---|---|---|
hkuds/easyrec-roberta-small | 82M | 0.0286 |
hkuds/easyrec-roberta-base | 125M | 0.0518 |
hkuds/easyrec-roberta-large | 355M | 0.0557 |
π Citation
If you find this work is helpful to your research, please consider citing our paper:
@article{ren2024easyrec,
title={EasyRec: Simple yet Effective Language Models for Recommendation},
author={Ren, Xubin and Huang, Chao},
journal={arXiv preprint arXiv:2408.08821},
year={2024}
}
Thanks for your interest in our work!