iAkashPaul
/

gemma-7b-it-gguf

Inference Endpoints

Model card Files Files and versions Community

iAkashPaul commited on Feb 22, 2024

Commit

7c9fc52

·

verified ·

1 Parent(s): 2ff140c

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -16,4 +16,6 @@ Contains Q4 & Q8 quantized GGUFs for [google/gemma](https://huggingface.co/colle
 | Variant | Device | Perf |
 | - | - | - |
 | Q4 | RTX 2070S | 22 tok/s |
-| Q8 | RTX 2070S | 7 tok/s (could only offload 23/29 layers to GPU) |

 | Variant | Device | Perf |
 | - | - | - |
 | Q4 | RTX 2070S | 22 tok/s |
+| | M1 Pro 10-core GPU | 28 tok/s |
+| Q8 | RTX 2070S | 7 tok/s (could only offload 23/29 layers to GPU) |
+| | M1 Pro 10-core GPU | 17 tok/s |