INDEX
Explanations
occurrences of the word "I" and related personal pronouns indicating self-reference
New Auto-Interp
Negative Logits
igo
-0.17
让æĪij
-0.16
myself
-0.15
ãĥ¼ãĥĢ
-0.14
ceased
-0.14
ynet
-0.14
’da
-0.13
’na
-0.13
ι
-0.13
Admir
-0.13
POSITIVE LOGITS
plan
0.23
plan
0.20
think
0.19
haven
0.18
Think
0.17
may
0.17
figure
0.17
finally
0.16
hope
0.16
iswa
0.16
Activations Density 0.229%