INDEX
Explanations
references to unresolved story elements and endings in narratives
New Auto-Interp
Negative Logits
kÃ¶ÅŁ
-0.14
hek
-0.14
erspective
-0.14
æ¡Ĥ
-0.14
μον
-0.14
opak
-0.13
utsch
-0.13
ç°
-0.13
bish
-0.13
TestId
-0.13
POSITIVE LOGITS
leave
0.29
leaving
0.29
leaves
0.27
left
0.27
cliff
0.26
ending
0.25
Leave
0.24
leave
0.23
Leaving
0.22
Leave
0.22
Activations Density 0.108%