INDEX
Explanations
terms related to extremist ideologies or groups
references to neo-Nazi ideology and groups
New Auto-Interp
Negative Logits
hereby
-0.76
bets
-0.73
inhal
-0.72
contacting
-0.72
emoji
-0.72
overdue
-0.71
outstanding
-0.71
outp
-0.71
accompanying
-0.71
iT
-0.68
POSITIVE LOGITS
Nazi
1.61
Nazis
1.50
liberal
1.50
fascist
1.34
femin
1.30
colonial
1.27
conservative
1.24
liber
1.24
radical
1.17
natal
1.17
Activations Density 0.028%