INDEX
Explanations
possessive pronouns and personal attributes
New Auto-Interp
Negative Logits
_
0.24
a
0.23
with
0.23
=
0.22
an
0.21
("0.21
-
0.19
r
0.19
}
0.19
{0.18
POSITIVE LOGITS
own
0.22
intentions
0.21
instincts
0.18
predicament
0.18
motivations
0.18
preferências
0.18
personal
0.18
próprias
0.18
propias
0.17
temperament
0.17
Activations Density 0.671%