INDEX
Negative Logits
Safe
-0.58
risk
-0.52
warning
-0.51
Safe
-0.51
protection
-0.49
dangerous
-0.48
safety
-0.48
warn
-0.46
danger
-0.46
afety
-0.45
POSITIVE LOGITS
WithIOException
0.84
informée
0.83
Reſ
0.82
Anſ
0.80
kasarigan
0.79
Theſe
0.79
themſelves
0.78
$_"
0.78
NameInMap
0.77
Jefus
0.77
Activations Density 0.015%