INDEX
Explanations
personal pronouns followed by a verb
first-person pronouns and expressions of personal experience
New Auto-Interp
Negative Logits
ussen
-0.78
ãĥĬ
-0.76
isite
-0.72
aughtered
-0.71
phans
-0.71
asma
-0.69
guiActiveUn
-0.69
etheus
-0.69
²¾
-0.67
ENDED
-0.67
POSITIVE LOGITS
'm
0.79
exagger
0.68
RL
0.67
coer
0.66
ufact
0.66
tub
0.66
flirt
0.65
tatt
0.64
've
0.64
ñ
0.63
Activations Density 0.276%