My experience

#2
by Ardvark123 - opened

This one is very similar to the first. Meaning it is really good. This thing was coherent only just starting to mess up names and who was who at about 18000 context for me. Even then only some of the time. The bad side is I noticed way more gptisms in this version, nothing a bit of editing won't fix, so could have been luck of the draw. But solid model.

Very interesting! Did you prefer cgato/L3-TheSpice-8b-v0.1.3? [Quants.]

Testing is always tricky because a lot of the time small changes happen and it's hard to be sure with our normal roleplay testing. I didn't play much the TheSpice personally so feedback is useful for me to paint a better picture in my head.

Hmm well I actually went back and tried that absurdly high context chat since I use vector and save it all. the original model seemed to have a little more trouble keeping up that high in comparison. So based on that alone, I will take a few editable gptisms for this higher context ability. Though as you stated it can be kind of subjective I guess. I hope it helped a little though!

I appreciate it, thanks!

Author seems to recommend disabled instruct mode and default context mode, not ChatML instruct?

You're right, @Hardeh ! - Adjusted.

Most llama-3 rp models I've tried fall apart just after 16K, it's like a sudden brain injury.

I guess it's due to training in smaller context batches than what meta can.
I also learnt that meta trained llama-3-8B on two of their nodes, each have a casual 24,576 GPUs, it's estimated that it took them 3 days to train the entire model 😭
And that's only 48K of the 500k GPUs they do have πŸ˜Άβ€πŸŒ«οΈ
Imagine how quick making loras would be-

Sai, it's time to get a job at the Meta AI division.

Or commit robbery :3
Some unrelated math
Humans speak between 7,000-16,000 words day, taking the highest option of 16,000:
16,000 x 365 = 5,840,000 per person per year
5,840,000 x 8,109,690,961 = 47,360,595,212,240,000 spoken per year-ish worldwide
H100 @ FP8 with GPT-J is able to achieve over 10,000 output tok/s at peak throughput for 64 concurrent requests. Meta has 500K+ H100's worth of compute power
That is 5,000,000,000 tokens per second capable on metas hardware alone.
Or 18,000,000,000,000 tokens per hour, 432,000,000,000,000 tokens per day, 157,680,000,000,000,000 tokens per year
Or 3.329... Times more tokens than humans speak per year
One H100 can output 53,848 times more tokens, than one human speaks words in a day
Using every H100 that has been shipped it would be possible to rewrite the entire history of the earth in a couple of years
Also, a H100 weighs 3KG per GPU and Meta has 500K H100s worth of compute, meaning if you piled up all of Meta's GPUs it would weigh atleast 1500 Tons

Sign up or log in to comment