tiiuae
/

falcon-mamba-7b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

JingweiZuo commited on Jul 24

Commit

bdad334

•

1 Parent(s): ded105d

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -29,6 +29,7 @@ language:
 - **Language(s) (NLP):** Mainly English
 - **License:** TII Falcon-Mamba License 2.0
 # Usage
@@ -185,6 +186,8 @@ Also, we applied *BatchScaling* during the rampup — rescaling learning rate \\
 The model training took roughly two months.
 # Evaluation
 ## Benchmarks
@@ -235,6 +238,7 @@ pip install "causal-conv1d>=1.4.0" mamba-ssm
 Refer to our technical report for more details about performance evaluation.
 # Technical Specifications
@@ -262,6 +266,8 @@ Falcon-Mamba-7B was trained on AWS SageMaker, using on average 256 H100 80GB GPU
 Falcon-Mamba-7B was trained on an internal distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels.
 # Citation
 *Paper coming soon* 😊.

 - **Language(s) (NLP):** Mainly English
 - **License:** TII Falcon-Mamba License 2.0
+<br>
 # Usage
 The model training took roughly two months.
+<br>
 # Evaluation
 ## Benchmarks
 Refer to our technical report for more details about performance evaluation.
+<br>
 # Technical Specifications
 Falcon-Mamba-7B was trained on an internal distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels.
+<br>
 # Citation
 *Paper coming soon* 😊.