INDEX
Explanations
references to storytelling or narratives
New Auto-Interp
Negative Logits
ovich
-0.17
ir
-0.14
teness
-0.14
(PR
-0.14
Ãłi
-0.14
rog
-0.14
ãĥ¥
-0.14
евиÑĩ
-0.13
rogate
-0.13
------+------+
-0.13
POSITIVE LOGITS
story
0.56
Story
0.48
stories
0.46
story
0.45
Story
0.44
tale
0.42
STORY
0.40
_story
0.40
Stories
0.40
-story
0.38
Activations Density 0.034%