INDEX
Explanations
stories or narratives
phrases that indicate contrasting narratives or viewpoints
New Auto-Interp
Negative Logits
isk
-0.70
aband
-0.70
tarian
-0.67
tarians
-0.65
edition
-0.65
eligible
-0.65
poon
-0.64
ktop
-0.64
[+
-0.63
qus
-0.63
POSITIVE LOGITS
tale
1.63
story
1.63
stories
1.50
tales
1.36
Story
1.29
STORY
1.29
truth
1.28
anecdote
1.19
truths
1.15
Stories
1.14
Activations Density 0.154%