INDEX
Explanations
words related to violence and criminal behavior
New Auto-Interp
Negative Logits
ciplinary
-0.80
iosyncr
-0.77
spons
-0.77
translation
-0.76
taboola
-0.75
pora
-0.70
pmwiki
-0.69
lighting
-0.69
aston
-0.68
fitted
-0.66
POSITIVE LOGITS
him
1.32
somebody
1.20
someone
1.19
anybody
1.15
them
1.13
anyone
1.06
Him
1.03
oneself
0.96
someone
0.96
whoever
0.95
Activations Density 0.261%