Text Generation
Transformers
Safetensors
English
falcon_mamba
Eval Results
Inference Endpoints
JingweiZuo commited on
Commit
bdad334
โ€ข
1 Parent(s): ded105d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -29,6 +29,7 @@ language:
29
  - **Language(s) (NLP):** Mainly English
30
  - **License:** TII Falcon-Mamba License 2.0
31
 
 
32
 
33
  # Usage
34
 
@@ -185,6 +186,8 @@ Also, we applied *BatchScaling* during the rampup โ€” rescaling learning rate \\
185
 
186
  The model training took roughly two months.
187
 
 
 
188
  # Evaluation
189
 
190
  ## Benchmarks
@@ -235,6 +238,7 @@ pip install "causal-conv1d>=1.4.0" mamba-ssm
235
  Refer to our technical report for more details about performance evaluation.
236
 
237
 
 
238
 
239
  # Technical Specifications
240
 
@@ -262,6 +266,8 @@ Falcon-Mamba-7B was trained on AWS SageMaker, using on average 256 H100 80GB GPU
262
 
263
  Falcon-Mamba-7B was trained on an internal distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels.
264
 
 
 
265
  # Citation
266
 
267
  *Paper coming soon* ๐Ÿ˜Š.
 
29
  - **Language(s) (NLP):** Mainly English
30
  - **License:** TII Falcon-Mamba License 2.0
31
 
32
+ <br>
33
 
34
  # Usage
35
 
 
186
 
187
  The model training took roughly two months.
188
 
189
+ <br>
190
+
191
  # Evaluation
192
 
193
  ## Benchmarks
 
238
  Refer to our technical report for more details about performance evaluation.
239
 
240
 
241
+ <br>
242
 
243
  # Technical Specifications
244
 
 
266
 
267
  Falcon-Mamba-7B was trained on an internal distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels.
268
 
269
+ <br>
270
+
271
  # Citation
272
 
273
  *Paper coming soon* ๐Ÿ˜Š.