INDEX
Explanations
negative prefixes indicating opposition or rejection
New Auto-Interp
Negative Logits
<unused42>
-1.09
<unused41>
-1.09
<unused16>
-1.09
<unused28>
-1.09
<unused3>
-1.09
[@BOS@]
-1.09
<unused8>
-1.09
<unused14>
-1.09
<unused43>
-1.09
<unused51>
-1.09
POSITIVE LOGITS
the
0.83
against
0.43
↵↵
0.41
The
0.40
↵
0.39
the
0.38
my
0.37
Schild
0.36
The
0.36
our
0.34
Activations Density 0.016%