Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
takarajordan 
posted an update 14 days ago
Post
2197
I'm super excited to release my first open-source text dataset:

WorldScenario 20K is a novel dataset of 20,000 synthetically generated multi-stakeholder scenarios designed to simulate real-world decision-making processes. Each scenario explores a unique environmental, societal, or economic issue.

I used the brand new meta-llama/Llama-3.3-70B-Instruct model to generate this dataset and I put the dataset through some post processing to clean and evaluate the dataset for diversity.

I'd appreciate some feedback and thoughts on my new release! Thanks!

takarajordan/WorldScenario_20K

congrats!

·

Thanks Clem!!

Sir how do u preprocess the dataset as i have also created a dataset for my university to fine tune llama 2 model but it does not giving me good output so please help me

·

I preprocessed this into ChatML format to train the model takarajordan/WorldScenario-3.2B_GGUF and I used Unsloth to finetune it!

If you want more help join the HuggingFace discord, I'm always in there.

This comment has been hidden