INDEX
Explanations
personal pronouns indicating ownership or affiliation
pronouns related to personal experience and perspective
New Auto-Interp
Negative Logits
itialized
-0.74
quartered
-0.69
vine
-0.68
cru
-0.67
zyme
-0.66
orously
-0.66
Monstrous
-0.64
tyard
-0.63
ories
-0.63
ĸļ
-0.63
POSITIVE LOGITS
sake
0.98
personally
0.97
liking
0.91
selves
0.89
purposes
0.89
self
0.79
learners
0.77
ummies
0.76
reasons
0.74
selves
0.72
Activations Density 0.110%