INDEX
Explanations
persons responsible for criminal or violent acts
New Auto-Interp
Negative Logits
ories
-0.81
izoph
-0.67
ilion
-0.66
earchers
-0.66
etting
-0.65
aeda
-0.65
equality
-0.65
Invalid
-0.65
Bridge
-0.65
arus
-0.63
POSITIVE LOGITS
who
1.28
whom
1.14
who
1.11
hood
0.99
named
0.95
surn
0.93
whose
0.89
's
0.88
esses
0.85
WHO
0.85
Activations Density 2.636%