Update README.md
Browse files
README.md
CHANGED
@@ -128,7 +128,7 @@ F1 score was used to measure performance, prioritizing detection of noncomplianc
|
|
128 |
|
129 |
### Results
|
130 |
|
131 |
-
NAVI-small-preview achieved an F1 score of 86.8% on public subset of PAV dataset, outperforming all tested alternatives except full-scale NAVI. We evaluate against general-purpose solutions like Claude and Open AI models, as well as some guardrails focusing on groundedness to demonstrate a clear distinction of policy verification from the more common groundedness.
|
132 |
|
133 |
| Model | F1 Score | Avg Latency (ms) |
|
134 |
|--------------------------|----------|------------------|
|
|
|
128 |
|
129 |
### Results
|
130 |
|
131 |
+
NAVI-small-preview achieved an F1 score of 86.8% on public subset of PAV dataset, outperforming all tested alternatives except full-scale NAVI. We evaluate against general-purpose solutions like Claude and Open AI models, as well as some guardrails focusing on groundedness to demonstrate a clear distinction of policy verification from the more common groundedness verification.
|
132 |
|
133 |
| Model | F1 Score | Avg Latency (ms) |
|
134 |
|--------------------------|----------|------------------|
|