Beyond frustrating
The temptation i have to go into debt to purchase a machine that could handle 70b+ models. it is beyond frustrating that i can barely handle a Q1 version of something like this. My curiosity to know and to review these models is crippling. but here i am stuck with 40 gigs of ram and a pitiful 4gb of vram.
Okay rant over thanks for listening.
It's OK. We are listening.
(Now... with a graphics card that can do prompt processing, and 40GB DRAM. And patience. You could run a Q2_K not half-bad. I did have 64GB/12GB VRAM when I ran goliath 120b, as a Q2_K, and not an imatrix quant, either. Yes, context length will be meager as well. Even an IQ3_M should fit, although a lot more patience would be required).
i actually have 128gb of spare ddr3 ram (scavenged from an old dell server) and a tesla k80 laying around. was considering getting a bitcoin mining motherboard to use for ai stuff.
Now think how all that sounds to the avergae joe with 16GB RAM and 0GB VRAM.
my parents and girlfriend look at me with glazed eyes whenever i talk about this stuff. i wonder if this is how car people feel when they start talking about motors and.. car things.
I'm a programmer since early childhood. I know that look for my whole life :)
@Utochi Why don't you use Command-r plus. It's free and better than any of these open models for RP. Look it up on cohere's site. I tried almost everything and it's the best so far. You don't lose much if you don't try these experimental models. And also you can try a lot of them on openrouter for a little money. My recommendation is don't purchase high end GPU yet. They don't worth it with these unrefined models.
Well, I was so waiting for this model, too, but so far, llama-3 left me mostly unimpressed. Refusal, instant "end of story" wrap-ups, general confusion. And I am not even asking for anything unethical. And something weird goes on with specifically this model - the i-quants seem to be close to broken, sometimes just printing gibberish, sometimes just very low quality output. What the heck is going on.
whats your go-to model been as of late? @mradermacher
Still QuartetAnemoi, sometimes Midnight-Miqu and for a while I tried Moist-Miqu, but that was... too unstable. I am eagerly waiting for some llama3-based (or even qwen2-based) model, but I found qwen2 rather underwhelming, especially in its language support, and nothing l3-based felt right. In fact, every l3-based model sooner or later spams me with refusal messages. But then, maybe it just takes half a year... Anyway, not entirely happy with the state of things :)
Ok, the weird quality issue was a problem with my download, not the actual files here. It's now just... underwhelming, not broken. Sigh.
The temptation i have to go into debt to purchase a machine that could handle 70b+ models. it is beyond frustrating that i can barely handle a Q1 version of something like this. My curiosity to know and to review these models is crippling. but here i am stuck with 40 gigs of ram and a pitiful 4gb of vram.
Okay rant over thanks for listening.
Why not just rent some GPU online when you want to use it? My computer is a Mac mini, lol, I could never run anything useful. Even though I spend too much money on it (not like I have that much to spend haha), it's WAY cheaper than buying any useful hardware.
mradermacher/Camelidae-8x7B-GGUF
You could try sparse models. That isn't imatrix, but you could possibly get IQ3_XS with a little CPU offloading. Depending upon how much CPU RAM you want to use, you could do more. It's nothing special, but it's a decent model.