INDEX
Explanations
references to Nazi Germany and its historical actions
New Auto-Interp
Negative Logits
Hentet
-0.39
tagext
-0.37
Liver
-0.36
flux
-0.36
addPreferredGap
-0.36
nucle
-0.35
úgó
-0.35
australiano
-0.35
IBS
-0.34
adelant
-0.34
POSITIVE LOGITS
Hitler
1.05
Hitler
0.93
Nazi
0.88
NSD
0.83
Mussolini
0.82
Fas
0.77
fascist
0.77
SS
0.76
Reich
0.75
Führer
0.74
Activations Density 0.444%