brucethemoose/Capybara-Tess-Yi-34B-200K-DARE-Ties

migtissera

Nov 25, 2023

I would try one with just Tess-M-v1.3.

Tess-M-v1.2 is very good, but have some repetition issues at very long context.

I sorted it out with Tess-M-v1.3: https://migel.substack.com/p/testing-tess-m-v13

brucethemoose

Owner Nov 25, 2023

•

edited Nov 25, 2023

Yeah I read the post! I actually did a pure 1.2 merge before 1.3 came out, and it didn't seem to repeat or output json umprompted in my quick tests (out to about 60K).

https://huggingface.co/brucethemoose/Capybara-Tess12-Yi-34B-200K-DARE

The merge is not straightforward, some of the model weights get trimmed... Hence I suspect this won't repeat since 1.2 was only merged in at ~1/5 weight with less density. I was already thinking I would remerge with just 1.3 if it repeats like Tess 1.2, or maybe turn the density way down.

brucethemoose

Owner Nov 25, 2023

•

edited Nov 25, 2023

Also, @migtissera while you are here, have you considered training Tess 1.3 on top of nous capybara?

Maybe even applying the existing Qlora would work well? Before any 200ks came out, regular 32K Yi loras worked well enough on the 200K base model.

brucethemoose

Owner Nov 28, 2023

I think you were right, I swapped Tess out for an airoboros 200K finetune.

brucethemoose
/

Capybara-Tess-Yi-34B-200K-DARE-Ties

Only with Tess-v1.3