smallcloudai
/

Refact-1_6-base

+---
+pipeline_tag: text-generation
+inference: true
+widget:
+- text: 'def print_hello_world():'
+  example_title: Hello world
+  group: Python
+license: bigscience-openrail-m
+datasets:
+- books
+- arxiv
+- c4
+- falcon-refinedweb
+- wiki
+- github-issues
+- stack_markdown
+library_name: transformers
+tags:
+- code
+language:
+- en
+---
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/643a9dd0c5f633a7fa7e804a/HkB0QYV0BbmB3ktMugbZy.png)
+# Refact-1.6B-base
+Finally, the model we started training with our [blog post](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) is ready 🎉
+The model might contain some problems, especially with the FIM format
+# It Works As a Chat
+The primary application of this model is code completion (infill) in multiple programming languages.
+But it works as a chat quite well.
+# Example
+Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
+```python
+# pip install -q transformers
+from transformers import AutoModelForCausalLM, AutoTokenizer
+checkpoint = "smallcloudai/Refact-1_6B-fim"
+device = "cuda" # for GPU usage or "cpu" for CPU usage
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
+prompt = '<fim_prefix>def print_hello_world():\n    """<fim_suffix>\n    print("Hello world!")<fim_middle>'
+inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
+outputs = model.generate(inputs, max_length=100, temperature=0.2)
+print("-"*80)
+print(tokenizer.decode(outputs[0]))
+```
+# Chat Format
+The same model works as chat (experimental).
+```python
+prompt_template = "<empty_output>SYSTEM {system}\n" \
+                  "<empty_output>USER {query}\n" \
+                  "<empty_output>ASSISTANT"
+prompt = prompt_template.format(system="You are a programming assistant",
+                                query="How do I sort a list in Python?")
+```
+# Architecture
+As described in more detail in the blog post, we used:
+- [ALiBi](https://arxiv.org/abs/2108.12409) based attention
+- [LayerNorm](https://arxiv.org/abs/1607.06450v1) instead of [RMSNorm](https://arxiv.org/pdf/1910.07467.pdf)
+- [Multi Query Attention](https://arxiv.org/abs/1911.02150)
+We also used LiON, flash attention, early dropout. It's not that innovative that you can't run it, in fact you can -- see an example below.
+# Training
+For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets.
+Filtering is the key to success of this model:
+- We only used text in English
+- Only topics related to computer science
+- Applied heavy deduplication
+The text to code proportion was 50:50, model trained for 1.2T tokens.
+We don't release the base model, because its Fill-in-the-Middle (FIM) capability likes to repeat itself too much, so
+its practical use is limited. But if you still want it, write us a message on Discord.
+# Limitations and Bias
+The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
+code comments. Its performance on non-English languages is lower, for sure.
+# Model Stats
+- **Architecture:** LLAMA-like model with multi-query attention
+- **Objectives** Fill-in-the-Middle, Chat
+- **Tokens context:** 4096
+- **Pretraining tokens:** 1.2T
+- **Finetuning tokens:** 40B
+- **Precision:** bfloat16
+- **GPUs** 64 NVidia A5000
+- **Training time** 28 days
+# License
+The model is licensed under the BigScience OpenRAIL-M v1 license agreement
+# Citation
+If you are using this model, please give a link to this page.