INDEX
Explanations
directives to provide information or recount stories
phrases that involve narration or storytelling
New Auto-Interp
Negative Logits
imposed
-0.70
nam
-0.69
isk
-0.67
adesh
-0.66
cells
-0.65
ccording
-0.62
everal
-0.61
berus
-0.61
aband
-0.61
notor
-0.61
POSITIVE LOGITS
tale
1.58
tales
1.05
ingly
1.05
stories
1.02
us
1.00
tale
0.98
lies
0.92
Stories
0.88
Tale
0.87
me
0.83
Activations Density 0.062%