INDEX
Explanations
personal pronouns ('I', 'we', 'she') followed by specific related actions or thoughts
occurrences of the pronoun "I" and expressions of personal sentiment or agency
New Auto-Interp
Negative Logits
undy
-0.69
oras
-0.66
è¦ļéĨĴ
-0.64
ixt
-0.64
ormal
-0.63
代
-0.62
BaseType
-0.61
Journal
-0.59
ãĥ¼ãĥĨ
-0.59
ĺħ
-0.59
POSITIVE LOGITS
nonetheless
1.45
nevertheless
1.43
also
1.07
still
1.06
didn
1.05
couldn
1.04
alas
1.03
ain
1.01
never
1.00
'll
0.97
Activations Density 0.194%