sinequa
/

answer-finder-v1-L-multilingual

Question Answering

Inference Endpoints

Model card Files Files and versions Community

youval commited on Feb 19

Commit

1c14ce9

•

1 Parent(s): cb2fde5

remove duplicate memory section

Files changed (1) hide show

README.md +0 -11

README.md CHANGED Viewed

@@ -51,17 +51,6 @@ The model was trained and tested in the following languages:
 | FP16                                             |    550 MiB |
 | FP32                                             |   1050 MiB |
-Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
-size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
-can be around 0.5 to 1 GiB depending on the used GPU.
-## GPU Memory usage
-| Quantization type                                |   Memory   |
-|:-------------------------------------------------|-----------:|
-| FP16                                             |    547 MiB |
-| FP32                                             |   1060 MiB |
 Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.
 ## Requirements

 | FP16                                             |    550 MiB |
 | FP32                                             |   1050 MiB |
 Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.
 ## Requirements