Exact Training data used?
#1
by
nlpguy
- opened
Thanks for this amazing model. Is there an exact breakdown by source of the 1T Tokens used for training, or is there a specific collection of public corpuses that were used available?
psinger
changed discussion status to
closed