BeanCounter
Collection
An open dataset containing more than 150B tokens of low-toxicity and business-oriented text.
•
4 items
•
Updated
This model was created from Microsoft's Phi-1.5 model by continued pretraining on the BeanCounter dataset. Full details of the training process are available in Wang and Levy (2024). The model has not undergone any safety checks or alignment, thus it should be used for research purposes only.
If you use this model in your work, please cite us:
@inproceedings{
wang2024beancounter,
title={BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text},
author={Siyan Wang and Bradford Levy},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024},
url={https://openreview.net/forum?id=HV5JhUZGpP}
}