INDEX
Explanations
references to Nazi-related terms and historical events
New Auto-Interp
Negative Logits
tis
-1.17
Dub
-1.08
area
-1.07
Asset
-1.06
20439
-1.05
pring
-1.04
clip
-0.99
ths
-0.99
WHERE
-0.97
ional
-0.97
POSITIVE LOGITS
salute
1.22
sympath
1.19
chwitz
1.14
Youth
1.14
Hitler
1.09
takeover
1.04
enthal
1.01
ollah
0.99
abad
0.99
etz
0.98
Activations Density 0.965%