INDEX
Explanations
phrases related to political ideologies and groups
references to neo-Nazi groups and related ideologies
New Auto-Interp
Negative Logits
intervals
-0.73
exceptions
-0.69
eele
-0.65
AVG
-0.65
*/(
-0.65
Petty
-0.65
cliffe
-0.65
ulhu
-0.64
ado
-0.64
WD
-0.64
POSITIVE LOGITS
centric
0.97
Georg
0.96
destruct
0.94
Nazi
0.92
driven
0.92
optim
0.91
induced
0.88
analysis
0.88
inspired
0.87
friendly
0.87
Activations Density 0.130%