INDEX
Explanations
personal pronouns followed by a verb
references to the word "you"
New Auto-Interp
Negative Logits
Ambro
-0.65
ãĥ³ãĤ¸
-0.65
Dimension
-0.61
Contents
-0.61
interstitial
-0.60
Rosenthal
-0.60
Sabha
-0.58
dimension
-0.57
Gw
-0.57
Balt
-0.57
POSITIVE LOGITS
're
1.30
wanna
1.06
want
1.01
've
1.01
guessed
0.99
wish
0.90
guys
0.90
hear
0.84
decide
0.84
subscribe
0.84
Activations Density 0.085%