ozyman commited on
Commit
688becf
·
verified ·
1 Parent(s): d662ca6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -128,7 +128,7 @@ F1 score was used to measure performance, prioritizing detection of noncomplianc
128
 
129
  ### Results
130
 
131
- NAVI-small-preview achieved an F1 score of 86.8% on public subset of PAV dataset, outperforming all tested alternatives except full-scale NAVI. We evaluate against general-purpose solutions like Claude and Open AI models, as well as some guardrails focusing on groundedness to demonstrate a clear distinction of policy verification from the more common groundedness.
132
 
133
  | Model | F1 Score | Avg Latency (ms) |
134
  |--------------------------|----------|------------------|
 
128
 
129
  ### Results
130
 
131
+ NAVI-small-preview achieved an F1 score of 86.8% on public subset of PAV dataset, outperforming all tested alternatives except full-scale NAVI. We evaluate against general-purpose solutions like Claude and Open AI models, as well as some guardrails focusing on groundedness to demonstrate a clear distinction of policy verification from the more common groundedness verification.
132
 
133
  | Model | F1 Score | Avg Latency (ms) |
134
  |--------------------------|----------|------------------|