--- language: en license: apache-2.0 --- # Model Card ## Metrics - position: -1 - layer: 30 - refusal_score: -13.144852638244629 - refusal_score_baseline: 4.240177154541016 - steering_score: 3.974774122238159 - steering_score_baseline: -14.775822639465332 - kl_div_score: 0.06443626040695713 - no_filter: 7 - nan_values: 0 - late_layer: 50 - high_kl: 135 - low_refusal: 48