hyp1231 commited on
Commit
c0e7d90
1 Parent(s): 1fd5cfe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md CHANGED
@@ -1,3 +1,78 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - McAuley-Lab/Amazon-Reviews-2023
5
+ language:
6
+ - en
7
+ tags:
8
+ - recommendation
9
+ - information retrieval
10
+ - Amazon Reviews 2023
11
+ base_model: FacebookAI/roberta-base
12
+ pipeline_tag: sentence-similarity
13
  ---
14
+
15
+ # BLaIR-roberta-large
16
+
17
+ <!-- Provide a quick summary of what the model is/does. -->
18
+
19
+ BLaIR, which is short for "**B**ridging **La**nguage and **I**tems for **R**etrieval and **R**ecommendation", is a series of language models pre-trained on Amazon Reviews 2023 dataset.
20
+
21
+ BLaIR is grounded on pairs of *(item metadata, language context)*, enabling the models to:
22
+ * derive strong item text representations, for both recommendation and retrieval;
23
+ * predict the most relevant item given simple / complex language context.
24
+
25
+ [[📑 Paper](https://arxiv.org/abs/2403.03952)] · [[💻 Code](https://github.com/hyp1231/AmazonReviews2023)] · [[🌐 Amazon Reviews 2023 Dataset](https://amazon-reviews-2023.github.io/)] · [[🤗 Huggingface Datasets](https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023)] · [[🔬 McAuley Lab](https://cseweb.ucsd.edu/~jmcauley/)]
26
+
27
+ ## Model Details
28
+
29
+ - **Language(s) (NLP):** English
30
+ - **License:** MIT
31
+ - **Finetuned from model:** [roberta-large](https://huggingface.co/FacebookAI/roberta-large)
32
+ - **Repository:** [https://github.com/hyp1231/AmazonReviews2023](https://github.com/hyp1231/AmazonReviews2023)
33
+ - **Paper:** [https://arxiv.org/abs/2403.03952](https://arxiv.org/abs/2403.03952)
34
+
35
+ ## Use with HuggingFace
36
+
37
+ ```python
38
+ import torch
39
+ from transformers import AutoModel, AutoTokenizer
40
+
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained("hyp1231/blair-roberta-large")
43
+ model = AutoModel.from_pretrained("hyp1231/blair-roberta-large")
44
+
45
+ language_context = 'I need a product that can scoop, measure, and rinse grains without the need for multiple utensils and dishes. It would be great if the product has measurements inside and the ability to rinse and drain all in one. I just have to be careful not to pour too much accidentally.'
46
+ item_metadata = [
47
+ 'Talisman Designs 2-in-1 Measure Rinse & Strain | Holds up to 2 Cups | Food Strainer | Fruit Washing Basket | Strainer & Colander for Kitchen Sink | Dishwasher Safe - Dark Blue. The Measure Rinse & Strain by Talisman Designs is a 2-in-1 kitchen colander and strainer that will measure and rinse up to two cups. Great for any type of food from rice, grains, beans, fruit, vegetables, pasta and more. After measuring, fill with water and swirl to clean. Strain then pour into your pot, pan, or dish. The convenient size is easy to hold with one hand and is compact to fit into a kitchen cabinet or pantry. Dishwasher safe and food safe.',
48
+ 'FREETOO Airsoft Gloves Men Tactical Gloves for Hiking Cycling Climbing Outdoor Camping Sports (Not Support Screen Touch).'
49
+ ]
50
+ texts = [language_context] + item_metadata
51
+
52
+ inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
53
+
54
+ # Get the embeddings
55
+ with torch.no_grad():
56
+ embeddings = model(**inputs, return_dict=True).last_hidden_state[:, 0]
57
+ embeddings = embeddings / embeddings.norm(dim=1, keepdim=True)
58
+
59
+ print(embeddings[0] @ embeddings[1]) # tensor(0.8564)
60
+ print(embeddings[0] @ embeddings[2]) # tensor(0.5741)
61
+ ```
62
+
63
+ ## Citation
64
+
65
+ If you find Amazon Reviews 2023 dataset, BLaIR checkpoints, Amazon-C4 dataset, or our scripts/code helpful, please cite the following paper.
66
+
67
+ ```bibtex
68
+ @article{hou2024bridging,
69
+ title={Bridging Language and Items for Retrieval and Recommendation},
70
+ author={Hou, Yupeng and Li, Jiacheng and He, Zhankui and Yan, An and Chen, Xiusi and McAuley, Julian},
71
+ journal={arXiv preprint arXiv:2403.03952},
72
+ year={2024}
73
+ }
74
+ ```
75
+
76
+ ## Contact
77
+
78
+ Please let us know if you encounter a bug or have any suggestions/questions by [filling an issue](https://github.com/hyp1231/AmazonReview2023/issues/new) or emailing Yupeng Hou at [[email protected]](mailto:[email protected]).