INDEX
Explanations
people's first and last names
New Auto-Interp
Negative Logits
1.04
0.96
,
0.88
(
0.84
a
0.84
.
0.79
↵
0.79
/
0.78
not
0.76
the
0.75
POSITIVE LOGITS
<unused1207>
1.11
chyné
1.11
oniazid
1.10
<unused1893>
1.06
ufieurs
1.06
<unused2070>
1.05
arrerol
1.04
<unused1106>
1.04
<unused1092>
1.04
<unused1759>
1.03
Activations Density 0.025%