INDEX
Explanations
references to neo-Nazi groups
references to neo-Nazi ideology and groups
New Auto-Interp
Negative Logits
ILCS
-0.92
=-=-=-=-
-0.78
Medium
-0.77
Issue
-0.77
ANGE
-0.75
hips
-0.73
perature
-0.72
loo
-0.71
ingham
-0.71
room
-0.71
POSITIVE LOGITS
neo
1.05
Neo
0.80
nihil
0.77
fascist
0.75
emer
0.75
igen
0.72
supremacist
0.71
mascul
0.69
ge
0.69
utsche
0.69
Activations Density 0.005%