INDEX
Explanations
proper nouns or named entities, although it also seems to have some sensitivity to titles
references to charitable organizations
Project, organization, and title names
New Auto-Interp
Negative Logits
-1.15
,
-1.02
the
-0.90
I
-0.88
in
-0.88
to
-0.87
that
-0.85
↵
-0.82
as
-0.82
(
-0.81
POSITIVE LOGITS
expandindo
1.54
Efq
1.45
Theſe
1.41
itſelf
1.35
disambiguazione
1.31
<unused43>
1.31
<unused14>
1.30
<unused16>
1.30
<unused8>
1.30
<pad>
1.30
Activations Density 3.431%