INDEX
Explanations
phrases or words related to extremist ideologies, particularly Neo-Nazism
terminology associated with neo-Nazi ideologies and movements
New Auto-Interp
Negative Logits
channelAvailability
-0.77
hips
-0.77
ãĤ¼ãĤ¦ãĤ¹
-0.76
loo
-0.74
Interstitial
-0.73
ILCS
-0.71
enance
-0.70
pring
-0.70
IUM
-0.70
REDACTED
-0.69
POSITIVE LOGITS
ge
0.94
flex
0.87
Nazi
0.83
forming
0.81
fer
0.81
-
0.79
emer
0.77
chal
0.76
form
0.76
formed
0.75
Activations Density 0.026%