Alt_LLM_LeaderBoard

Running

App Files Files Community

CultriX commited on Feb 14, 2024

Commit

8560ee2

verified ·

1 Parent(s): b005e3f

Update app.py

Browse files

Files changed (1) hide show

app.py +4 -24

app.py CHANGED Viewed

@@ -81,8 +81,6 @@ def get_model_info(df):
     return df
 def calculate_highest_combined_score(data, column):
     # Ensure the column exists and has numeric data
     if column not in data.columns or not pd.api.types.is_numeric_dtype(data[column]):
@@ -142,22 +140,10 @@ def main():
     st.title("🏆 YALL - Yet Another LLM Leaderboard")
     st.markdown("Leaderboard made with 🧐 [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) using [Nous](https://huggingface.co/NousResearch) benchmark suite.")
-    # Placeholder or logic to set 'content' with actual markdown or data
     content = create_yall()
-    # Ensure 'content' has a value before proceeding
-    if content:
-        df = convert_markdown_table_to_dataframe(content)
-        df = get_and_update_model_info(df)
-        score_columns = ['Average', 'AGIEval', 'GPT4All', 'TruthfulQA', 'Bigbench']
-        for col in score_columns:
-            if col in df.columns:
-                df[col] = pd.to_numeric(df[col], errors='coerce')
-        display_highest_combined_scores(df, score_columns)
     tab1, tab2 = st.tabs(["🏆 Leaderboard", "📝 About"])
     # Leaderboard tab
     with tab1:
         if content:
@@ -257,32 +243,26 @@ def main():
                 st.error(str(e))
         else:
             st.error("Failed to download the content from the URL provided.")
      # About tab
     with tab2:
         st.markdown('''
             ### Nous benchmark suite
             Popularized by [Teknium](https://huggingface.co/teknium) and [NousResearch](https://huggingface.co/NousResearch), this benchmark suite aggregates four benchmarks:
             * [**AGIEval**](https://arxiv.org/abs/2304.06364) (0-shot): `agieval_aqua_rat,agieval_logiqa_en,agieval_lsat_ar,agieval_lsat_lr,agieval_lsat_rc,agieval_sat_en,agieval_sat_en_without_passage,agieval_sat_math`
             * **GPT4ALL** (0-shot): `hellaswag,openbookqa,winogrande,arc_easy,arc_challenge,boolq,piqa`
             * [**TruthfulQA**](https://arxiv.org/abs/2109.07958) (0-shot): `truthfulqa_mc`
             * [**Bigbench**](https://arxiv.org/abs/2206.04615) (0-shot): `bigbench_causal_judgement,bigbench_date_understanding,bigbench_disambiguation_qa,bigbench_geometric_shapes,bigbench_logical_deduction_five_objects,bigbench_logical_deduction_seven_objects,bigbench_logical_deduction_three_objects,bigbench_movie_recommendation,bigbench_navigate,bigbench_reasoning_about_colored_objects,bigbench_ruin_names,bigbench_salient_translation_error_detection,bigbench_snarks,bigbench_sports_understanding,bigbench_temporal_sequences,bigbench_tracking_shuffled_objects_five_objects,bigbench_tracking_shuffled_objects_seven_objects,bigbench_tracking_shuffled_objects_three_objects`
             ### Reproducibility
             You can easily reproduce these results using 🧐 [LLM AutoEval](https://github.com/mlabonne/llm-autoeval/tree/master), a colab notebook that automates the evaluation process (benchmark: `nous`). This will upload the results to GitHub as gists. You can find the entire table with the links to the detailed results [here](https://gist.github.com/mlabonne/90294929a2dbcb8877f9696f28105fdf).
             ### Clone this space
             You can create your own leaderboard with your LLM AutoEval results on GitHub Gist. You just need to clone this space and specify two variables:
             * Change the `gist_id` in [yall.py](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard/blob/main/yall.py#L126).
             * Create "New Secret" in Settings > Variables and secrets (name: "github", value: [your GitHub token](https://github.com/settings/tokens))
             A special thanks to [gblazex](https://huggingface.co/gblazex) for providing many evaluations.
         ''')
 # Run the main function if this script is run directly
 if __name__ == "__main__":

     return df
 def calculate_highest_combined_score(data, column):
     # Ensure the column exists and has numeric data
     if column not in data.columns or not pd.api.types.is_numeric_dtype(data[column]):
     st.title("🏆 YALL - Yet Another LLM Leaderboard")
     st.markdown("Leaderboard made with 🧐 [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) using [Nous](https://huggingface.co/NousResearch) benchmark suite.")
+    # Create tabs for leaderboard and about section
     content = create_yall()
     tab1, tab2 = st.tabs(["🏆 Leaderboard", "📝 About"])
     # Leaderboard tab
     with tab1:
         if content:
                 st.error(str(e))
         else:
             st.error("Failed to download the content from the URL provided.")
      # About tab
     with tab2:
         st.markdown('''
             ### Nous benchmark suite
             Popularized by [Teknium](https://huggingface.co/teknium) and [NousResearch](https://huggingface.co/NousResearch), this benchmark suite aggregates four benchmarks:
             * [**AGIEval**](https://arxiv.org/abs/2304.06364) (0-shot): `agieval_aqua_rat,agieval_logiqa_en,agieval_lsat_ar,agieval_lsat_lr,agieval_lsat_rc,agieval_sat_en,agieval_sat_en_without_passage,agieval_sat_math`
             * **GPT4ALL** (0-shot): `hellaswag,openbookqa,winogrande,arc_easy,arc_challenge,boolq,piqa`
             * [**TruthfulQA**](https://arxiv.org/abs/2109.07958) (0-shot): `truthfulqa_mc`
             * [**Bigbench**](https://arxiv.org/abs/2206.04615) (0-shot): `bigbench_causal_judgement,bigbench_date_understanding,bigbench_disambiguation_qa,bigbench_geometric_shapes,bigbench_logical_deduction_five_objects,bigbench_logical_deduction_seven_objects,bigbench_logical_deduction_three_objects,bigbench_movie_recommendation,bigbench_navigate,bigbench_reasoning_about_colored_objects,bigbench_ruin_names,bigbench_salient_translation_error_detection,bigbench_snarks,bigbench_sports_understanding,bigbench_temporal_sequences,bigbench_tracking_shuffled_objects_five_objects,bigbench_tracking_shuffled_objects_seven_objects,bigbench_tracking_shuffled_objects_three_objects`
             ### Reproducibility
             You can easily reproduce these results using 🧐 [LLM AutoEval](https://github.com/mlabonne/llm-autoeval/tree/master), a colab notebook that automates the evaluation process (benchmark: `nous`). This will upload the results to GitHub as gists. You can find the entire table with the links to the detailed results [here](https://gist.github.com/mlabonne/90294929a2dbcb8877f9696f28105fdf).
             ### Clone this space
             You can create your own leaderboard with your LLM AutoEval results on GitHub Gist. You just need to clone this space and specify two variables:
             * Change the `gist_id` in [yall.py](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard/blob/main/yall.py#L126).
             * Create "New Secret" in Settings > Variables and secrets (name: "github", value: [your GitHub token](https://github.com/settings/tokens))
             A special thanks to [gblazex](https://huggingface.co/gblazex) for providing many evaluations.
         ''')
 # Run the main function if this script is run directly
 if __name__ == "__main__":