INDEX
Explanations
references to innocence and the protection of innocent individuals
New Auto-Interp
Negative Logits
Mathias
-0.72
ilangkan
-0.65
وتسجيلات
-0.61
shifted
-0.60
agot
-0.60
visst
-0.59
Pelham
-0.59
impresa
-0.57
arım
-0.57
Suivez
-0.57
POSITIVE LOGITS
innocent
1.93
Innoc
1.77
Innocent
1.77
innocent
1.70
innocence
1.64
innoc
1.58
innoc
1.52
Innocence
1.48
inocente
1.35
Innoc
1.33
Activations Density 0.175%