INDEX
Explanations
references to violent crimes and acts of aggression
New Auto-Interp
Negative Logits
ars
-0.18
imas
-0.16
ai
-0.16
esters
-0.15
anders
-0.15
å¾
-0.14
pos
-0.14
èįĴ
-0.14
atter
-0.14
cmp
-0.13
POSITIVE LOGITS
fellow
0.18
imore
0.16
NavItem
0.15
rowsing
0.15
Tome
0.14
wart
0.14
iveness
0.14
oran
0.14
awe
0.14
part
0.14
Activations Density 0.094%