INDEX
Explanations
instances of the pronoun "I" and variations of it in the text
New Auto-Interp
Negative Logits
ãģ«ãģ¨
-0.18
iset
-0.16
ä¹ī
-0.14
Pretty
-0.14
alex
-0.14
ibraltar
-0.14
zw
-0.13
à¥Ģà¤Ĺ
-0.13
erer
-0.13
tribute
-0.13
POSITIVE LOGITS
heard
0.20
meant
0.19
told
0.16
Heard
0.15
hear
0.15
sorry
0.14
heard
0.14
bet
0.14
SOR
0.14
APPER
0.14
Activations Density 0.193%