INDEX
Explanations
punctuation and quotation marks in the text
New Auto-Interp
Negative Logits
Are
-0.47
gridy
-0.40
fromnode
-0.38
IF
-0.37
worfen
-0.37
but
-0.37
Or
-0.36
uxxxx
-0.36
gridx
-0.36
Be
-0.35
POSITIVE LOGITS
is
1.88
has
1.30
was
1.24
will
1.10
can
1.02
would
0.98
may
0.96
does
0.93
should
0.93
are
0.89
Activations Density 0.438%