Eval Requests

#31
by isr431 - opened

Can you please add anthracite-org/magnum-v2-4b, nbeerbower/Lyra-Gutenberg-mistral-nemo-12B and nbeerbower/Gutensuppe-mistral-nemo-12B to the leaderboard? Thanks!

Would love to see cognitivecomputations/dolphin-2.9.4-gemma2-2b added too!

Sao10K/L3.1-70B-Euryale-v2.2 was just released!

Sao10K/L3.1-70B-Euryale-v2.2 was just released!

Added.
From the model page:
"May be less 'uncensored' zero-shot due to removal of c2 samples"
Sadge

Thanks! Can you also add Sao10K/MN-12B-Lyra-v3 & anthracite-org/magnum-v2-4b? Just a question, what format do you use for testing models (GGUF, AWQ etc.)?

Thanks! Can you also add Sao10K/MN-12B-Lyra-v3 & anthracite-org/magnum-v2-4b? Just a question, what format do you use for testing models (GGUF, AWQ etc.)?

I test all models as Q4_K_M.gguf both because it's cheaper and most people don't run full models, they run quants. I run the models using an oobabooga RunPod instance, and it doesn't seem like support for those 4bs has been added yet so I'm still waiting on that. I'll test the new Lyra when the right quant has been made ๐Ÿ‘

Can you add a 'Unknown' option in the model sizes? This should only show the closed-sourced models or models with unknown parameter sizes.

Can you add TheDrummer/Hubble-4B-v1? Thx

TheDrummer/UnslopNemo-v1-GGUF was just released

Can you pls add maywell/PiVoT-0.1-Evil-a?

Sao10K/MN-12B-Lyra-v4 from sao!

anthracite-org/magnum-v3-9b-chatml-gguf and anthracite-org/magnum-v3-9b-customgemma2-gguf were released, would be interested to see how they compare!

nbeerbower/Lyra4-Gutenberg-12B and nbeerbower/gemma2-gutenberg-27B released

TheDrummer/UnslopNemo-v2-GGUF

nbeerbower/Lyra4-Gutenberg-12B just released, wasn't released when I posted about it originally

anthracite-org/magnum-v3-27b-kto

Hey thank you for your work! Could you please add these models? They recently received support in llama.cpp
https://huggingface.co/allenai/OLMoE-1B-7B-0924
https://huggingface.co/allenai/OLMoE-1B-7B-0924-SFT
https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct

Can you please evaluate this model: nbeerbower/mistral-nemo-gutades-12B

Hey there! Can you add the newly released Gemini 1.5 Pro 002 to the leaderboard? The previous Gemini models were pretty uncensored, interesting to see how this performs.

Can you please add bartowski/Mistral-Nemo-Gutenberg-Doppel-12B-GGUF? Thanks

Can you please add nbeerbower/Lyra4-Gutenberg2-12B and nbeerbower/Gemma2-Gutenberg-Doppel-9B?

Please also add Tiger Gemma v3

Rocinante v2 (formerly UnslopNemo) has a 10/10 W rating on the leaderboard. But I have run across a lot of unwillingness or disclaimers. It needs reevaluation. The model is frequently updated. https://huggingface.co/TheDrummer/UnslopNemo-v2-GGUF

Can you please add ZeusLabs/Chronos-Platinum-72B? Thanks.

It needs reevaluation. The model is frequently updated.

It doesn't seem like the files have been updated when I look at the commits.

Can you please evaluate this model: smelborp/StellarDong-72b? Thanks.

Rocinante v2 (formerly UnslopNemo) has a 10/10 W rating on the leaderboard. But I have run across a lot of unwillingness or disclaimers. It needs reevaluation. The model is frequently updated. https://huggingface.co/TheDrummer/UnslopNemo-v2-GGUF

In my tests with this model on text-generation-webui+llama.cpp and sillytavern+koboldcpp, both with temp=1.0, topk=1, ChatML instruct format, and the recommended system prompt, I observed different behaviors. Text-generation-webui responds to all prompts with little to no disclaimers, while sillytavern makes the model behave like a censored one. I think tokenization might be broken on either text-generation-webui or llama.cpp/ llama-cpp-python because it breaks tokens like "<|im_start|>" into multiple tokens instead of a single one.

Can you add TheDrummer/UnslopNemo-12B-v3-GGUF?

Sign up or log in to comment