INDEX
Explanations
expressions related to violence and violent behavior
New Auto-Interp
Negative Logits
Normdatei
-0.91
noDo
-0.71
Schuyler
-0.70
NOPQRST
-0.70
sembler
-0.67
quelize
-0.66
guava
-0.63
essment
-0.63
zzang
-0.63
>=",
-0.63
POSITIVE LOGITS
violence
1.20
Violence
1.16
violence
1.07
Violent
1.07
Violent
0.97
Violence
0.96
violent
0.96
violen
0.95
violent
0.94
VIOL
0.88
Activations Density 0.005%