INDEX
Explanations
personal pronouns and self-references
instances of the pronoun "I" and expressions of personal reflection or experience
New Auto-Interp
Negative Logits
¿½
-0.63
Plaint
-0.63
ļéĨĴ
-0.63
%%%%
-0.57
ä¹ĭ
-0.56
Appearances
-0.55
Weak
-0.54
rules
-0.54
harms
-0.54
Children
-0.53
POSITIVE LOGITS
'm
1.36
figured
1.23
've
1.09
decided
1.09
stumbled
1.08
realised
1.01
thought
1.00
am
1.00
realized
0.97
guess
0.96
Activations Density 0.155%