INDEX
Explanations
references to significant historical events or figures related to World War II
New Auto-Interp
Negative Logits
sey
-0.15
hani
-0.15
ì°°
-0.15
ucus
-0.14
tic
-0.14
_HOT
-0.14
ãģ«åIJij
-0.14
lisi
-0.14
Anc
-0.14
nia
-0.14
POSITIVE LOGITS
Hitler
0.23
Hit
0.19
193
0.19
Revision
0.17
Nazi
0.17
NS
0.17
Hit
0.17
Blitz
0.16
194
0.16
Adolf
0.15
Activations Density 0.099%