INDEX
Explanations
references to fascism, extremism, and far-right political ideology
New Auto-Interp
Negative Logits
溜
-0.47
Christmas
-0.46
meyi
-0.44
rzej
-0.44
rpc
-0.44
interess
-0.43
INTERESAR
-0.42
Mehl
-0.42
тифика
-0.42
Weihnachten
-0.41
POSITIVE LOGITS
extremist
0.98
extré
0.88
extremism
0.88
totalitarian
0.87
extremists
0.87
extremos
0.86
Extrem
0.85
fascist
0.81
extré
0.80
Hitler
0.79
Activations Density 0.402%