INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pleaſure
    -0.83
     poffe
    -0.75
     cauſe
    -0.74
     Shakspeare
    -0.74
    SequentialGroup
    -0.73
     Efq
    -0.72
     Pythagoras
    -0.72
    :✨
    -0.71
     itſelf
    -0.70
     purpoſe
    -0.70
    POSITIVE LOGITS
     these
    0.86
     those
    0.86
     the
    0.86
     seus
    0.66
     our
    0.65
     them
    0.65
     us
    0.63
     his
    0.61
     its
    0.60
    those
    0.56
    Act Density 0.192%

    No Known Activations