INDEX
Explanations
mentions of wars, particularly World Wars I and II
New Auto-Interp
Negative Logits
yun
-0.17
yyy
-0.16
third
-0.15
fourth
-0.15
yyyy
-0.14
xxxxxxxx
-0.14
yk
-0.13
yen
-0.13
ropic
-0.13
ê³
-0.13
POSITIVE LOGITS
II
0.24
Two
0.23
Two
0.19
ll
0.18
âħ
0.18
-era
0.18
One
0.18
lord
0.17
edor
0.17
lords
0.17
Activations Density 0.006%