INDEX
Explanations
conditional phrases involving expectations or obligations
New Auto-Interp
Negative Logits
Shakspeare
-0.86
itſelf
-0.78
jména
-0.75
pośred
-0.72
Efq
-0.69
yoda
-0.69
Mahomet
-0.67
Pyrene
-0.67
Jefus
-0.67
sonne
-0.66
POSITIVE LOGITS
the
1.35
a
0.98
its
0.88
be
0.87
their
0.86
our
0.86
TO
0.83
this
0.79
those
0.78
an
0.77
Activations Density 0.566%