Text Generation
Transformers
Safetensors
English
olmoe
Mixture of Experts
olmo
conversational
Inference Endpoints
Muennighoff commited on
Commit
783d931
1 Parent(s): 4260823

0824 > 0924

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -13,7 +13,7 @@ co2_eq_emissions: 1
13
 
14
  # Model Summary
15
 
16
- > OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in August 2024 (0824) that has been adapted via SFT and DPO from [OLMoE-1B-7B](https://hf.co/OLMoE/OLMoE-1B-7B-0824). It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.
17
 
18
  - Code: https://github.com/allenai/OLMoE
19
  - Paper:
@@ -30,8 +30,8 @@ import torch
30
  DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
31
 
32
  # Load different ckpts via passing e.g. `revision=kto`
33
- model = OlmoeForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-Instruct").to(DEVICE)
34
- tokenizer = AutoTokenizer.from_pretrained("allenai/OLMoE-1B-7B-Instruct")
35
  message = [{"role": "user", "content": "Explain to me like I'm five what is Bitcoin."}]
36
  inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
37
  out = model.generate(**inputs, max_length=64)
@@ -40,9 +40,9 @@ print(tokenizer.decode(out[0]))
40
  ```
41
 
42
  Branches:
43
- - `main`: Preference tuned via DPO model of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT (`main` branch)
44
- - `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT
45
- - `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/OLMoE/OLMoE-1B-7B-0824)
46
  - `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.
47
 
48
  # Citation
 
13
 
14
  # Model Summary
15
 
16
+ > OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in September 2024 (0924) that has been adapted via SFT and DPO from [OLMoE-1B-7B](https://hf.co/OLMoE/OLMoE-1B-7B-0924). It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.
17
 
18
  - Code: https://github.com/allenai/OLMoE
19
  - Paper:
 
30
  DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
31
 
32
  # Load different ckpts via passing e.g. `revision=kto`
33
+ model = OlmoeForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-0924-Instruct").to(DEVICE)
34
+ tokenizer = AutoTokenizer.from_pretrained("allenai/OLMoE-1B-7B-0924-Instruct")
35
  message = [{"role": "user", "content": "Explain to me like I'm five what is Bitcoin."}]
36
  inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
37
  out = model.generate(**inputs, max_length=64)
 
40
  ```
41
 
42
  Branches:
43
+ - `main`: Preference tuned via DPO model of https://hf.co/OLMoE/OLMoE-1B-7B-0924-SFT (`main` branch)
44
+ - `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0924-SFT
45
+ - `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/OLMoE/OLMoE-1B-7B-0924)
46
  - `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.
47
 
48
  # Citation