INDEX
Explanations
second person pronouns followed by verbs
phrases addressing the reader directly
New Auto-Interp
Negative Logits
uca
-0.71
etus
-0.67
dimension
-0.63
trapping
-0.63
ulner
-0.62
suspending
-0.59
ender
-0.58
charging
-0.58
abin
-0.57
odo
-0.57
POSITIVE LOGITS
guessed
1.09
see
0.97
know
0.96
yourselves
0.94
guys
0.93
probably
0.90
read
0.89
remember
0.89
're
0.89
've
0.88
Activations Density 0.109%