Will you consider releasing a public dataset?
Here's the thing, I've noticed that from Mega to Manticore, and now to Hippogriff, it seems like you all have been using the Pygmalion dataset. The open-source community has probably also realized that in order to achieve better and more open-ended role-playing effects, it's not necessarily required to align with datasets like Alpaca and Vicuna that resemble GPT more. Instead, we should lean towards Pygmalion.
If you consider releasing datasets like Pygmalion and hellaswag (updated with 30K+ rows), it should encourage the open-source community to use Falcon, Guanaco, RedPajama, BLOOM, and other tools to train better models based on Pygmalion.
Unfortunately I'm bound by oath not to release the pygmalion dataset. The hellaswag dataset I'm using is here: https://huggingface.co/datasets/winglian/evals/blob/main/hellaswag/hellaswag.jsonl
releasing datasets like Pygmalion
From what I've heard from one of the people involved with the project, the reason they don't release it is because it contains a lot of data that might be upsetting to some people. If you actually intend to use it for training and have trained models in the past you can probably reach out to one of the members for a copy.
Thank you both for your patient explanation and sharing. I will try to contact the Pygmalion team.