INDEX
Explanations
connections and transitions in the narrative
New Auto-Interp
Negative Logits
ickers
-0.17
tingham
-0.15
ulus
-0.15
/on
-0.15
ashboard
-0.15
tah
-0.14
emez
-0.14
Homeland
-0.13
Heg
-0.13
istor
-0.13
POSITIVE LOGITS
itself
0.28
herself
0.25
zich
0.23
themselves
0.23
himself
0.23
yourself
0.21
oneself
0.21
siÄĻ
0.19
myself
0.19
ourselves
0.18
Activations Density 0.243%