INDEX
Explanations
references to wars, particularly World Wars and their specific contexts
New Auto-Interp
Negative Logits
fourth
-0.15
fifth
-0.15
third
-0.15
sixth
-0.14
forth
-0.14
Stim
-0.13
xxxxxxxx
-0.13
forth
-0.13
unf
-0.13
ninth
-0.13
POSITIVE LOGITS
II
0.27
II
0.23
اÙĦعاÙĦÙħÙĬØ©
0.19
(World
0.19
Two
0.18
Two
0.17
âħ
0.16
lesia
0.16
coma
0.16
zimmer
0.16
Activations Density 0.007%