JingweiZuo
commited on
Commit
โข
bdad334
1
Parent(s):
ded105d
Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,7 @@ language:
|
|
29 |
- **Language(s) (NLP):** Mainly English
|
30 |
- **License:** TII Falcon-Mamba License 2.0
|
31 |
|
|
|
32 |
|
33 |
# Usage
|
34 |
|
@@ -185,6 +186,8 @@ Also, we applied *BatchScaling* during the rampup โ rescaling learning rate \\
|
|
185 |
|
186 |
The model training took roughly two months.
|
187 |
|
|
|
|
|
188 |
# Evaluation
|
189 |
|
190 |
## Benchmarks
|
@@ -235,6 +238,7 @@ pip install "causal-conv1d>=1.4.0" mamba-ssm
|
|
235 |
Refer to our technical report for more details about performance evaluation.
|
236 |
|
237 |
|
|
|
238 |
|
239 |
# Technical Specifications
|
240 |
|
@@ -262,6 +266,8 @@ Falcon-Mamba-7B was trained on AWS SageMaker, using on average 256 H100 80GB GPU
|
|
262 |
|
263 |
Falcon-Mamba-7B was trained on an internal distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels.
|
264 |
|
|
|
|
|
265 |
# Citation
|
266 |
|
267 |
*Paper coming soon* ๐.
|
|
|
29 |
- **Language(s) (NLP):** Mainly English
|
30 |
- **License:** TII Falcon-Mamba License 2.0
|
31 |
|
32 |
+
<br>
|
33 |
|
34 |
# Usage
|
35 |
|
|
|
186 |
|
187 |
The model training took roughly two months.
|
188 |
|
189 |
+
<br>
|
190 |
+
|
191 |
# Evaluation
|
192 |
|
193 |
## Benchmarks
|
|
|
238 |
Refer to our technical report for more details about performance evaluation.
|
239 |
|
240 |
|
241 |
+
<br>
|
242 |
|
243 |
# Technical Specifications
|
244 |
|
|
|
266 |
|
267 |
Falcon-Mamba-7B was trained on an internal distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels.
|
268 |
|
269 |
+
<br>
|
270 |
+
|
271 |
# Citation
|
272 |
|
273 |
*Paper coming soon* ๐.
|