slimfrikha-tii Kasper Piskorski commited on
Commit
528ca22
·
verified ·
1 Parent(s): a9c965c

Update README.md (#2)

Browse files

- Update README.md (9af2c409cc322be6598edde67652aaf2b64c0a99)


Co-authored-by: Kasper Piskorski <[email protected]>

Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -16,16 +16,16 @@ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
16
 
17
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
18
 
19
- This repository contains the **Falcon3-7B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
20
- Falcon3-7B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
21
 
22
- ⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most usecases.**
23
 
24
  ## Model Details
25
  - Architecture
26
- - Transformer based causal decoder only architecture
27
  - 28 decoder blocks
28
- - Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
29
  - Wider head dimension: 256
30
  - High RoPE value to support long context understanding: 1000042
31
  - Uses SwiGLU and RMSNorm
 
16
 
17
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
18
 
19
+ This repository contains the **Falcon3-7B-Base**. It achieves state-of-the-art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
20
+ Falcon3-7B-Base supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K.
21
 
22
+ ⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most use cases.**
23
 
24
  ## Model Details
25
  - Architecture
26
+ - Transformer-based causal decoder-only architecture
27
  - 28 decoder blocks
28
+ - Grouped Query Attention (GQA) for faster inference: 12 query heads and 4 key-value heads
29
  - Wider head dimension: 256
30
  - High RoPE value to support long context understanding: 1000042
31
  - Uses SwiGLU and RMSNorm