Safetensors
English
mistral
File size: 1,723 Bytes
f788f29
 
 
 
 
 
 
 
 
 
 
 
91eca34
f788f29
 
 
 
 
 
 
 
 
 
 
e92507a
f788f29
 
 
5e287d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: apache-2.0
datasets:
- hamishivi/gsm8k-symbolic
language:
- en
base_model:
- hamishivi/tess2-v0.3-base
---
# TESS 2 v0.3 Symbolic - A Math-specific Tuned Diffusion LM

This model is the TESS 2 model trained on GSM8k symbolic data found [here](https://huggingface.co/datasets/hamishivi/gsm8k-symbolic), adapted from [here](https://github.com/HKUNLP/diffusion-of-thoughts). This model is a simplex-based diffusion model adapted from Mistral v0.1 7B, further trained on Dolma 1.7 and Tulu 2 SFT data.
For more details, please check out our paper [TESS-2: A Large-Scale, Generalist Diffusion Language Model](https://arxiv.org/abs/2502.13917).
This is the model based on Mistral v0.3 and trained on GSM8k data.

This model will only work with our custom codebase found [here](https://github.com/hamishivi/tess-2) -- please go there to see details on how to run training and inference.


## Using this model

To run this model, first clone https://github.com/hamishivi/tess-2.

Then, after creating a python environment with the correct packages, you can run inference via a ui with:
```sh
./shell_scripts/run_interactive_demo.sh hamishivi/tess2-v0.3
```

This allows you to directly interact with the model, and shows the diffusion generation process.
For training or other evaluations, please see our main repository.

## Citation

If you find this work useful, please cite this work as follows.

```bibtex
@misc{taeivison2025tess2,
  title={{TESS 2: A Large-Scale Generalist Diffusion Language Model}},
  author={Jaesung Tae and Hamish Ivison and Sachin Kumar and Arman Cohan},
  year={2025},
  eprint={2502.13917},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2502.13917},
 }
```