svakhreev commited on
Commit
1ff9414
1 Parent(s): 34f8d9a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -0
README.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: true
4
+ widget:
5
+ - text: 'def print_hello_world():'
6
+ example_title: Hello world
7
+ group: Python
8
+ license: bigscience-openrail-m
9
+ datasets:
10
+ - books
11
+ - arxiv
12
+ - c4
13
+ - falcon-refinedweb
14
+ - wiki
15
+ - github-issues
16
+ - stack_markdown
17
+ library_name: transformers
18
+ tags:
19
+ - code
20
+ language:
21
+ - en
22
+ ---
23
+
24
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643a9dd0c5f633a7fa7e804a/HkB0QYV0BbmB3ktMugbZy.png)
25
+
26
+
27
+ # Refact-1.6B-base
28
+
29
+ Finally, the model we started training with our [blog post](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) is ready 🎉
30
+ The model might contain some problems, especially with the FIM format
31
+
32
+
33
+ # It Works As a Chat
34
+
35
+ The primary application of this model is code completion (infill) in multiple programming languages.
36
+ But it works as a chat quite well.
37
+
38
+
39
+ # Example
40
+
41
+ Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
42
+
43
+ ```python
44
+ # pip install -q transformers
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer
46
+
47
+ checkpoint = "smallcloudai/Refact-1_6B-fim"
48
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
49
+
50
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
51
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
52
+
53
+ prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
54
+
55
+ inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
56
+ outputs = model.generate(inputs, max_length=100, temperature=0.2)
57
+ print("-"*80)
58
+ print(tokenizer.decode(outputs[0]))
59
+ ```
60
+
61
+ # Chat Format
62
+
63
+ The same model works as chat (experimental).
64
+
65
+ ```python
66
+ prompt_template = "<empty_output>SYSTEM {system}\n" \
67
+ "<empty_output>USER {query}\n" \
68
+ "<empty_output>ASSISTANT"
69
+ prompt = prompt_template.format(system="You are a programming assistant",
70
+ query="How do I sort a list in Python?")
71
+ ```
72
+
73
+ # Architecture
74
+
75
+ As described in more detail in the blog post, we used:
76
+
77
+ - [ALiBi](https://arxiv.org/abs/2108.12409) based attention
78
+ - [LayerNorm](https://arxiv.org/abs/1607.06450v1) instead of [RMSNorm](https://arxiv.org/pdf/1910.07467.pdf)
79
+ - [Multi Query Attention](https://arxiv.org/abs/1911.02150)
80
+
81
+ We also used LiON, flash attention, early dropout. It's not that innovative that you can't run it, in fact you can -- see an example below.
82
+
83
+
84
+ # Training
85
+
86
+ For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets.
87
+ Filtering is the key to success of this model:
88
+
89
+ - We only used text in English
90
+ - Only topics related to computer science
91
+ - Applied heavy deduplication
92
+
93
+ The text to code proportion was 50:50, model trained for 1.2T tokens.
94
+
95
+ We don't release the base model, because its Fill-in-the-Middle (FIM) capability likes to repeat itself too much, so
96
+ its practical use is limited. But if you still want it, write us a message on Discord.
97
+
98
+
99
+ # Limitations and Bias
100
+
101
+ The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
102
+ code comments. Its performance on non-English languages is lower, for sure.
103
+
104
+
105
+ # Model Stats
106
+
107
+ - **Architecture:** LLAMA-like model with multi-query attention
108
+ - **Objectives** Fill-in-the-Middle, Chat
109
+ - **Tokens context:** 4096
110
+ - **Pretraining tokens:** 1.2T
111
+ - **Finetuning tokens:** 40B
112
+ - **Precision:** bfloat16
113
+ - **GPUs** 64 NVidia A5000
114
+ - **Training time** 28 days
115
+
116
+
117
+ # License
118
+
119
+ The model is licensed under the BigScience OpenRAIL-M v1 license agreement
120
+
121
+
122
+ # Citation
123
+
124
+ If you are using this model, please give a link to this page.