“WadoodAbdul” commited on
Commit
48c47ed
·
1 Parent(s): 17c27e0

updated metrics comparision

Browse files
Files changed (1) hide show
  1. src/about.py +5 -5
src/about.py CHANGED
@@ -63,9 +63,7 @@ LLM_BENCHMARKS_TEXT_1 = f"""
63
 
64
  The Named Clinical Entity Recognition Leaderboard is aimed at advancing the field of natural language processing in healthcare. It provides a standardized platform for evaluating and comparing the performance of various language models in recognizing named clinical entities, a critical task for applications such as clinical documentation, decision support, and information extraction. By fostering transparency and facilitating benchmarking, the leaderboard's goal is to drive innovation and improvement in NLP models. It also helps researchers identify the strengths and weaknesses of different approaches, ultimately contributing to the development of more accurate and reliable tools for clinical use. Despite its exploratory nature, the leaderboard aims to play a role in guiding research and ensuring that advancements are grounded in rigorous and comprehensive evaluations.
65
 
66
- ## How it works
67
-
68
- ### Evaluation method and metrics
69
  When training a Named Entity Recognition (NER) system, the most common evaluation methods involve measuring precision, recall, and F1-score at the token level. While these metrics are useful for fine-tuning the NER system, evaluating the predicted named entities for downstream tasks requires metrics at the full named-entity level. We include both evaluation methods: token-based and span-based. We provide an example below which helps in understanding the difference between the methods.
70
  Example Sentence: "The patient was diagnosed with a skin cancer disease."
71
  For simplicity, let's assume the an example sentence which contains 10 tokens, with a single two-token disease entity (as shown in the figure below).
@@ -111,9 +109,11 @@ $$ Precision = COR / (COR + INC + SPU)$$
111
  $$ Recall = COR / (COR + INC + MIS)$$
112
  $$ f1score = 2 * (Prec * Rec) / (Prec + Rec)$$
113
 
 
 
 
114
 
115
- This span-based approach is equivalent to the Partial Match ("Type") in the nervaluate (NER evaluation considering partial match scoring) python package.
116
- Further examples are presented the section below (Other example evaluations).
117
 
118
  ## Datasets
119
  The following datasets (test splits only) have been included in the evaluation.
 
63
 
64
  The Named Clinical Entity Recognition Leaderboard is aimed at advancing the field of natural language processing in healthcare. It provides a standardized platform for evaluating and comparing the performance of various language models in recognizing named clinical entities, a critical task for applications such as clinical documentation, decision support, and information extraction. By fostering transparency and facilitating benchmarking, the leaderboard's goal is to drive innovation and improvement in NLP models. It also helps researchers identify the strengths and weaknesses of different approaches, ultimately contributing to the development of more accurate and reliable tools for clinical use. Despite its exploratory nature, the leaderboard aims to play a role in guiding research and ensuring that advancements are grounded in rigorous and comprehensive evaluations.
65
 
66
+ ## Evaluation method and metrics
 
 
67
  When training a Named Entity Recognition (NER) system, the most common evaluation methods involve measuring precision, recall, and F1-score at the token level. While these metrics are useful for fine-tuning the NER system, evaluating the predicted named entities for downstream tasks requires metrics at the full named-entity level. We include both evaluation methods: token-based and span-based. We provide an example below which helps in understanding the difference between the methods.
68
  Example Sentence: "The patient was diagnosed with a skin cancer disease."
69
  For simplicity, let's assume the an example sentence which contains 10 tokens, with a single two-token disease entity (as shown in the figure below).
 
109
  $$ Recall = COR / (COR + INC + MIS)$$
110
  $$ f1score = 2 * (Prec * Rec) / (Prec + Rec)$$
111
 
112
+ Note:
113
+ 1. Span-based approach here is equivalent to the 'Span Based Evaluation with Partial Overlap' in (NER Metrics Showdown!)[https://huggingface.co/spaces/wadood/ner_evaluation_metrics] and is equivalent to Partial Match ("Type") in the nervaluate python package.
114
+ 2. Token-based approach here is equivalent to the 'Token Based Evaluation With Macro Average' in (NER Metrics Showdown!)[https://huggingface.co/spaces/wadood/ner_evaluation_metrics]
115
 
116
+ Additional examples can be tested on the (NER Metrics Showdown!)[https://huggingface.co/spaces/wadood/ner_evaluation_metrics] huggingface space.
 
117
 
118
  ## Datasets
119
  The following datasets (test splits only) have been included in the evaluation.