INDEX
Explanations
names of people, particularly in contexts related to personal stories or experiences
New Auto-Interp
Negative Logits
-INF
-0.17
ÑĤÑİ
-0.16
ureen
-0.16
_ASSUME
-0.16
frau
-0.15
iddi
-0.14
aras
-0.14
«ĺ
-0.14
frauen
-0.14
ustain
-0.14
POSITIVE LOGITS
's
0.21
’s
0.20
from
0.19
&
0.19
B
0.18
and
0.18
the
0.18
O
0.17
T
0.17
R
0.17
Activations Density 0.664%