INDEX
Explanations
instances of change or shifts in power
New Auto-Interp
Negative Logits
perhaps
-0.18
surely
-0.15
perhaps
-0.15
Perhaps
-0.15
utsch
-0.14
gravity
-0.14
abra
-0.14
gravity
-0.14
opens
-0.14
.clips
-0.14
POSITIVE LOGITS
{{0.16
treaties
0.16
âĹ
0.15
Basically
0.14
nid
0.13
Classical
0.13
reform
0.13
naval
0.13
treaty
0.13
ãĤ¿ãĥ¼
0.13
Activations Density 0.000%