INDEX
Explanations
instances of first-person pronouns
New Auto-Interp
Negative Logits
,
-1.41
-1.37
a
-1.33
the
-1.30
in
-1.28
-
-1.21
and
-1.20
of
-1.20
to
-1.14
an
-1.13
POSITIVE LOGITS
myſelf
2.47
Monfieur
2.01
Efq
1.96
itſelf
1.84
pleaſure
1.75
Theſe
1.74
doubtnut
1.72
―――――
1.72
^(@)
1.72
auffi
1.68
Activations Density 0.454%