remove gpu type for memory usage
Browse files
README.md
CHANGED
@@ -45,20 +45,18 @@ The model was trained and tested in the following languages:
|
|
45 |
|
46 |
## GPU Memory usage
|
47 |
|
48 |
-
|
|
49 |
-
|
50 |
-
|
|
51 |
-
|
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
Note that GPU memory usage only includes how much GPU memory the actual model consumes those specific GPUs with a batch
|
56 |
size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
|
57 |
can be around 0.5 to 1 GiB depending on the used GPU.
|
58 |
|
59 |
## Requirements
|
60 |
|
61 |
-
- Minimal Sinequa version: 11.10.0
|
62 |
- [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
|
63 |
|
64 |
## Model Details
|
|
|
45 |
|
46 |
## GPU Memory usage
|
47 |
|
48 |
+
| Quantization type | Memory |
|
49 |
+
|:-------------------------------------------------|-----------:|
|
50 |
+
| FP16 | 547 MiB |
|
51 |
+
| FP32 | 1060 MiB |
|
52 |
+
|
53 |
+
Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
|
|
|
|
|
54 |
size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
|
55 |
can be around 0.5 to 1 GiB depending on the used GPU.
|
56 |
|
57 |
## Requirements
|
58 |
|
59 |
+
- Minimal Sinequa version: 11.10.0 (for NVIDIA L4 with FP16: 11.11.0)
|
60 |
- [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
|
61 |
|
62 |
## Model Details
|