INDEX
    Explanations

    negative prefixes indicating opposition or rejection

    New Auto-Interp
    Negative Logits
    <unused42>
    -1.09
    <unused41>
    -1.09
    <unused16>
    -1.09
    <unused28>
    -1.09
    <unused3>
    -1.09
    [@BOS@]
    -1.09
    <unused8>
    -1.09
    <unused14>
    -1.09
    <unused43>
    -1.09
    <unused51>
    -1.09
    POSITIVE LOGITS
     the
    0.83
     against
    0.43
    ↵↵
    0.41
    The
    0.40
    0.39
    the
    0.38
     my
    0.37
     Schild
    0.36
     The
    0.36
     our
    0.34
    Act Density 0.016%

    No Known Activations