INDEX
Explanations
negations or the word "not."
New Auto-Interp
Negative Logits
DockStyle
-1.04
يتيمه
-0.98
purpoſe
-0.96
Weyl
-0.96
uſe
-0.89
ſtate
-0.89
homoto
-0.89
Sopho
-0.87
Huguen
-0.86
recto
-0.86
POSITIVE LOGITS
is
1.34
a
1.08
being
1.02
not
1.00
are
0.99
was
0.98
Is
0.97
quite
0.97
è
0.93
an
0.92
Activations Density 0.102%