youval commited on
Commit
cb2fde5
1 Parent(s): 80c4451

Update model card with FP16 info (#1)

Browse files

- model card update with fp16 info (5bff5510294775fc06e3f9f5ce04873378a2f38d)
- remove gpu type for memory usage (5624ab98611c4a437f4c1a382e93fb69d06666be)
- adding nvidia l4 inference speed info (ed4c792921bb46c80c3a8407e085ca8c7b5fa5b2)
- being more general about minimal version with fp16 and gpus cc 8.9+ (448101435807a563cc516cb0ef1174037977ad8d)

Files changed (1) hide show
  1. README.md +87 -67
README.md CHANGED
@@ -1,67 +1,87 @@
1
- ---
2
- language:
3
- - de
4
- - en
5
- - es
6
- - fr
7
- ---
8
-
9
- # Model Card for `answer-finder-v1-L-multilingual`
10
-
11
- This model is a question answering model developed by Sinequa. It produces two lists of logit scores corresponding to
12
- the start token and end token of an answer.
13
-
14
- Model name: `answer-finder-v1-L-multilingual`
15
-
16
- ## Supported Languages
17
-
18
- The model was trained and tested in the following languages:
19
-
20
- - English
21
- - French
22
- - German
23
- - Spanish
24
-
25
- ## Scores
26
-
27
- | Metric | Value |
28
- |:--------------------------------------------------------------|-------:|
29
- | F1 Score on SQuAD v2 EN with Hugging Face evaluation pipeline | 75 |
30
- | F1 Score on SQuAD v2 EN with Haystack evaluation pipeline | 75 |
31
- | F1 Score on SQuAD v2 FR with Haystack evaluation pipeline | 73.4 |
32
- | F1 Score on SQuAD v2 DE with Haystack evaluation pipeline | 90.8 |
33
- | F1 Score on SQuAD v2 ES with Haystack evaluation pipeline | 67.1 |
34
-
35
- ## Inference Time
36
-
37
- | GPU Info | Batch size 1 | Batch size 32 |
38
- |:--------------------------------------------------------------|---------------:|---------------:|
39
- | NVIDIA A10 | 4 ms | 84 ms |
40
- | NVIDIA T4 | 15 ms | 362 ms |
41
-
42
- **Note that the Answer Finder models are only used at query time.**
43
-
44
- ## Requirements
45
-
46
- - Minimal Sinequa version: 11.10.0
47
- - GPU memory usage: 1060 MiB
48
-
49
- Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
50
- size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
51
- can be around 0.5 to 1 GiB depending on the used GPU.
52
-
53
- ## Model Details
54
-
55
- ### Overview
56
-
57
- - Number of parameters: 110 million
58
- - Base language model: [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
59
- pre-trained by Sinequa in English, French, German and Spanish
60
- - Insensitive to casing and accents
61
-
62
- ### Training Data
63
-
64
- - [SQuAD v2](https://rajpurkar.github.io/SQuAD-explorer/)
65
- - [French-SQuAD](https://github.com/Alikabbadj/French-SQuAD) + French translation of SQuAD v2 "impossible" query-passage pairs
66
- - [GermanQuAD](https://www.deepset.ai/germanquad) + German translation of SQuAD v2 "impossible" query-passage pairs
67
- - [SQuAD-es-v2](https://github.com/ccasimiro88/TranslateAlignRetrieve)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ - en
5
+ - es
6
+ - fr
7
+ ---
8
+
9
+ # Model Card for `answer-finder-v1-L-multilingual`
10
+
11
+ This model is a question answering model developed by Sinequa. It produces two lists of logit scores corresponding to the start token and end token of an answer.
12
+
13
+ Model name: `answer-finder-v1-L-multilingual`
14
+
15
+ ## Supported Languages
16
+
17
+ The model was trained and tested in the following languages:
18
+
19
+ - English
20
+ - French
21
+ - German
22
+ - Spanish
23
+
24
+ ## Scores
25
+
26
+ | Metric | Value |
27
+ |:--------------------------------------------------------------|-------:|
28
+ | F1 Score on SQuAD v2 EN with Hugging Face evaluation pipeline | 75 |
29
+ | F1 Score on SQuAD v2 EN with Haystack evaluation pipeline | 75 |
30
+ | F1 Score on SQuAD v2 FR with Haystack evaluation pipeline | 73.4 |
31
+ | F1 Score on SQuAD v2 DE with Haystack evaluation pipeline | 90.8 |
32
+ | F1 Score on SQuAD v2 ES with Haystack evaluation pipeline | 67.1 |
33
+
34
+ ## Inference Time
35
+
36
+ | GPU | Quantization type | Batch size 1 | Batch size 32 |
37
+ |:------------------------------------------|:------------------|---------------:|---------------:|
38
+ | NVIDIA A10 | FP16 | 2 ms | 30 ms |
39
+ | NVIDIA A10 | FP32 | 4 ms | 83 ms |
40
+ | NVIDIA T4 | FP16 | 3 ms | 65 ms |
41
+ | NVIDIA T4 | FP32 | 14 ms | 373 ms |
42
+ | NVIDIA L4 | FP16 | 2 ms | 38 ms |
43
+ | NVIDIA L4 | FP32 | 5 ms | 124 ms |
44
+
45
+ **Note that the Answer Finder models are only used at query time.**
46
+
47
+ ## Gpu Memory usage
48
+
49
+ | Quantization type | Memory |
50
+ |:-------------------------------------------------|-----------:|
51
+ | FP16 | 550 MiB |
52
+ | FP32 | 1050 MiB |
53
+
54
+ Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
55
+ size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
56
+ can be around 0.5 to 1 GiB depending on the used GPU.
57
+
58
+ ## GPU Memory usage
59
+
60
+ | Quantization type | Memory |
61
+ |:-------------------------------------------------|-----------:|
62
+ | FP16 | 547 MiB |
63
+ | FP32 | 1060 MiB |
64
+
65
+ Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.
66
+
67
+ ## Requirements
68
+
69
+ - Minimal Sinequa version: 11.10.0
70
+ - Minimal Sinequa version for using FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0
71
+ - [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
72
+
73
+ ## Model Details
74
+
75
+ ### Overview
76
+
77
+ - Number of parameters: 110 million
78
+ - Base language model: [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
79
+ pre-trained by Sinequa in English, French, German and Spanish
80
+ - Insensitive to casing and accents
81
+
82
+ ### Training Data
83
+
84
+ - [SQuAD v2](https://rajpurkar.github.io/SQuAD-explorer/)
85
+ - [French-SQuAD](https://github.com/Alikabbadj/French-SQuAD) + French translation of SQuAD v2 "impossible" query-passage pairs
86
+ - [GermanQuAD](https://www.deepset.ai/germanquad) + German translation of SQuAD v2 "impossible" query-passage pairs
87
+ - [SQuAD-es-v2](https://github.com/ccasimiro88/TranslateAlignRetrieve)