INDEX
Explanations
first person singular pronouns 'I' followed by various verbs and expressions
instances of self-reference or expressions of personal experience
New Auto-Interp
Negative Logits
senal
-0.71
cribed
-0.68
è¦ļéĨĴ
-0.67
ommel
-0.67
naire
-0.66
ablished
-0.64
oras
-0.64
urated
-0.63
esc
-0.61
ãĥ¼ãĥĨ
-0.60
POSITIVE LOGITS
nonetheless
1.35
nevertheless
1.29
ALSO
1.16
also
1.15
dig
1.05
gotta
1.04
still
1.02
ain
1.02
etheless
1.00
beware
0.98
Activations Density 0.222%