Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1012

Feature request: Run 100B + models automatically

#434

by ChuckMcSneed - opened Dec 7, 2023

Discussion

ChuckMcSneed

Dec 7, 2023

Goliath-120B(https://huggingface.co/alpindale/goliath-120b) was submitted for evaluation almost a month ago. There are still no results. According to @clefourrier there is not enough memory to run it. Please fix it.

clefourrier

Open LLM Leaderboard org Dec 7, 2023

Hi @ChuckMcSneed ,
As mentioned in the other discussion, our backend cannot at the moment manage to evaluate models this big (as they don't fit on one A100 node). We will add the feature in our roadmap.

Thank you for opening this issue! We'll keep track of it!

clefourrier changed discussion title from Goliath-120B evaluation to Feature request: Run 100B + models automatically Dec 7, 2023

clefourrier

Open LLM Leaderboard org Dec 7, 2023

Note: When we add this feature, we will also need to relaunch these two models:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Sao10K/Solus-103B-L2_eval_request_False_float16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Sao10K/Lila-103B-L2_eval_request_False_float16_Original.json

TNTOutburst

Dec 7, 2023

I noticed falcon-180b got removed from the leaderboard and these finetunes still on it:
OpenBuddy/openbuddy-falcon-180b-v13-preview0
OpenBuddy/openbuddy-falcon-180b-v12-preview0
Are they needing to be retested?

clefourrier

Open LLM Leaderboard org Dec 8, 2023

Hi! Falcon-180B is still on the leaderboard if you select the little toggle to "Show gated/deleted/private models"

jphme

Dec 11, 2023

We would also be very happy to see this feature added, submitted DiscoResearch/DiscoLM-120b some time ago and didn't know what was the reason for it being stuck at "pending".

Maybe you could add some manual job to run >70b models on demand on a 4*A100 instance?

Thank you for you work on the leaderboard!

KnutJaegersberg

Dec 27, 2023

Can you eval bnb 4 bit quantizations of large models? It be beneficial to have some kind of indication. I was thinking about quip# 2bitting a goliath model, so I can fit it on my gpu locally, but that will take like 1.5 weeks to calculate. If it's not better I dont feel like doing it.
So if I use bnb to 4 bit goliath and submit it for eval will it work?

KnutJaegersberg

Dec 27, 2023

I've submitted TheBloke/DiscoLM-120b-GPTQ now. Hope that works.

ChuckMcSneed

Jan 26

@clefourrier How is the progress? Still trying to implement it or is it just too expensive?

ChuckMcSneed

Feb 24

@clefourrier Why was MegaDolphin-120b successfully tested while all the other 120b models have failed?

clefourrier

Open LLM Leaderboard org Feb 24

Hi @ChuckMcSneed ,
Can you link the request file? I suspect it was submitted quantized - which could barely fit on our GPUs.

ChuckMcSneed

Feb 24

https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/cognitivecomputations/MegaDolphin-120b_eval_request_False_float16_Original.json
Doesn't seem to be quantized.

clefourrier

Open LLM Leaderboard org Feb 26

Interesting!
It might be because we went from using A100 to using H100, and they don't seem to manage memory exactly the same, which could have allowed a slightly bigger model to fit (but just barely).
Other idea: @SaylorTwift did you launch MegaDolphin manually?

If not, we could try relaunching some of the bigger models (like goliath) and see what happens

softwareweaver

May 28

Are 100+B models supported now?
I submitted softwareweaver/Twilight-Miqu-146B and it's status says RUNNING since yesterday.

clefourrier

Open LLM Leaderboard org May 29

Hi! They should be supported in most cases, but might still fail, the system is not super robust yet.
However, just FYI, since a 70B takes at minimum 10h to evaluate, don't expect a 146B to evaluate instantaneously XD

softwareweaver

May 29

Thanks @clefourrier
Someone on Localllama forum said 100+B models are not supported and this thread did not have a definite answer.

clefourrier changed discussion status to closed Jun 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment