INDEX
Explanations
instances of the word "you," indicating a focus on direct address or engagement with the reader
New Auto-Interp
Negative Logits
was
-0.22
itself
-0.17
amp
-0.15
(s
-0.15
isnt
-0.14
¤ëĭ¤
-0.14
Was
-0.14
ãģłãĤįãģĨ
-0.14
говоÑĢиÑĤ
-0.14
ìĿ´ëĭ¤
-0.14
POSITIVE LOGITS
’re
0.61
're
0.55
’ve
0.48
've
0.48
are
0.43
’ll
0.37
'll
0.35
yourself
0.33
aren
0.33
guys
0.31
Activations Density 0.386%