INDEX
Explanations
references to historical events and their contexts
New Auto-Interp
Negative Logits
ossil
-0.16
aille
-0.15
bear
-0.14
usu
-0.14
arf
-0.14
disposal
-0.14
iginal
-0.14
plat
-0.13
dispose
-0.13
iscal
-0.13
POSITIVE LOGITS
Bat
0.21
bat
0.20
Gu
0.20
gu
0.19
guerra
0.19
camp
0.19
Camp
0.19
invasion
0.18
Exped
0.18
bat
0.18
Activations Density 0.051%