OALL/Open-Arabic-LLM-Leaderboard · Model eval request FAILED ... how do we know the root cause?

Jul 28

I requested an eval 3 times for our new model but all failed although the eval runs successfully using lighteval

Lighteval Run Commands

!git clone https://github.com/huggingface/lighteval.git
%cd lighteval
!pip install -e . && pip install accelerate
!wget https://raw.githubusercontent.com/huggingface/lighteval/main/examples/tasks/all_arabic_tasks.txt -O examples/tasks/all_arabic_tasks.txt
%env HF_DATASETS_TRUST_REMOTE_CODE=1
!accelerate launch -m
lighteval accelerate
--model_args="pretrained=silma-ai/SILMA-9B-Instruct-v0.1.1,trust_remote_code=True"
--custom_tasks community_tasks/arabic_evals.py
--tasks examples/tasks/all_arabic_tasks.txt
--override_batch_size 1 --save_details --output_dir="./output_gpt2"

Model request file below
https://huggingface.co/datasets/OALL/requests/commit/c6a182a11b637ed7787bbedab46f63d5c690f1a9

My question: How can we determine the cause of the failure on your side so we could resolve the issue?

alielfilali01

Open Arabic LLM Leaderboard org Jul 30

Hey @karimouda ,
Apologies for the late reply, well i see that the model is based on Gemma2 9B which also fails to run (we are still investigating the issue)
The main issue is that you are launching your evals with the trust_remote_code=True tag which we don't support !

karimouda

Jul 30

Thanks Ali for your response. Is there anything we could do on our side to make it work or we should wait until the Gemma2 issue is resolved ?

Also as far as I understood, the trust_remote_code=True is mandatory for the Arabic datasets used in Lighteval, is there a way we could the run the eval without it?