File size: 3,488 Bytes
2448351
 
 
 
 
 
 
 
 
7e2faa1
 
 
2448351
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: cc-by-4.0
datasets:
- bltlab/queryner
language:
- en
metrics:
- f1
pipeline_tag: token-classification
inference:
  parameters:
    aggregation_strategy: "first"
---

# Model Card for Model ID

E-commerce query segmentation model in English.
This model is trained on QueryNER training dataset with the addition of augmentations so the model should be more robust to spelling mistakes and mentions unseen in the training data.


## Model Details

### Model Description

This is a token classification model using BERT base uncased as the base model.
The model is fine-tuned on the (QueryNER training dataset)[https://huggingface.co/datasets/bltlab/queryner] and augmented data as described in the QueryNER paper. 


- **Developed by:** [BLT Lab](https://github.com/bltlab) in collaboration with eBay.
- **Funded by:** eBay
- **Shared by:** (@cpalenmichel)[https://github.com/cpalenmichel]
- **Model type:** Token Classification / Sequence Labeling / Chunking
- **Language(s) (NLP):** English
- **License:** CC-BY 4.0
- **Finetuned from model:** BERT base uncased

### Model Sources

Underlying model is based on [BERT base-uncased](https://huggingface.co/google-bert/bert-base-uncased). 

- **Repository:** [https://github.com/bltlab/query-ner](https://github.com/bltlab/query-ner)
- **Paper:** Accepted at LREC-COLING Coming soon

## Uses

### Direct Use

Intended use is research purposes and e-commerce query segmentation.

### Downstream Use

Potential downstream use cases include weighting entity spans, linking to knowledge bases, removing spans as a recovery strategy for null and low recall queries.

### Out-of-Scope Use

This model is trained only on the training data of the QueryNER dataset. It may not perform well on other domains without additional training data and further fine-tuning.

## Bias, Risks, and Limitations

See paper limitations section.

## How to Get Started with the Model

See huggingface tutorials for token classification and access the model using AutoModelForTokenClassification.
Note that we do some post processing to make use of only the first subtoken's tag unlike the inference API.

## Training Details

### Training Data

See paper for details.


### Training Procedure

See paper for details.

#### Training Hyperparameters

See paper for details.


## Evaluation

Evaluation details provided in the paper. 
Scoring was done using [SeqScore](https://github.com/bltlab/seqscore) using the conlleval repair method for invalid label transition sequences. 

### Testing Data, Factors & Metrics

#### Testing Data

QueryNER test set: [https://huggingface.co/datasets/bltlab/queryner](https://huggingface.co/datasets/bltlab/queryner)


#### Factors
Evaluation is reported with micro-F1 at the entity level on the QueryNER test set. 
We used conlleval repair method for invalid label transitions.

#### Metrics
We use micro-F1 at the entity level as this is fairly common practice for NER models.

### Results

[More Information Needed]


## Environmental Impact
Rough estimate

- **Hardware Type:** 1 RTX 3090 GPU 
- **Hours used:** < 2 hours
- **Cloud Provider:** Private
- **Compute Region:** northamerica-northeast1
- **Carbon Emitted:** 0.02


## Citation

Accepted at LREC-COLING coming soon

**BibTeX:**

Accepted at LREC-COLING coming soon


## Model Card Authors 

Chester Palen-Michel (@cpalenmichel)[https://github.com/cpalenmichel]

## Model Card Contact

Chester Palen-Michel (@cpalenmichel)[https://github.com/cpalenmichel]