INDEX
Negative Logits
Severity
0.73
oloj
0.68
think
0.65
蔷
0.64
the
0.61
Aspect
0.61
Speaker
0.61
Agustus
0.61
th
0.60
atrice
0.60
POSITIVE LOGITS
ве
0.73
פ
0.66
фами
0.66
ב
0.66
誣
0.66
المط
0.65
РИ
0.64
כן
0.63
apă
0.63
מי
0.62
Activations Density 0.000%