INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    *
    -0.84
    TagMode
    -0.57
     myſelf
    -0.57
    !*
    -0.51
     itſelf
    -0.51
     ourselves
    -0.47
     bess
    -0.47
    #+#
    -0.47
     mona
    -0.46
     rhy
    -0.44
    POSITIVE LOGITS
     The
    0.90
     On
    0.81
     If
    0.81
     In
    0.80
     A
    0.79
     And
    0.78
     It
    0.77
     You
    0.77
     To
    0.77
     For
    0.76
    Act Density 0.302%

    No Known Activations