INDEX
Explanations
elements related to narrative structure or twists in storytelling
New Auto-Interp
Negative Logits
lately
-0.15
ington
-0.15
Kend
-0.15
pletely
-0.15
recently
-0.14
now
-0.14
unks
-0.14
lsen
-0.14
urtles
-0.14
urret
-0.14
POSITIVE LOGITS
throughout
0.19
igli
0.17
ault
0.16
disappoint
0.16
Throughout
0.16
reviewer
0.15
initially
0.15
Overall
0.15
chy
0.15
Overall
0.15
Activations Density 0.044%