INDEX
    Explanations

    instances where a person is speaking or expressing themselves

    instances of the pronoun "I"

    New Auto-Interp
    Negative Logits
     PTS
    -0.74
    itol
    -0.68
     Ele
    -0.62
    imum
    -0.62
     Virtue
    -0.61
    ision
    -0.61
    lihood
    -0.61
    enges
    -0.60
     Concord
    -0.59
     airs
    -0.59
    POSITIVE LOGITS
    've
    1.22
    'm
    1.19
    'll
    1.18
     dunno
    1.07
     forgot
    1.06
    'd
    1.02
     swear
    0.93
    RL
    0.89
     suppose
    0.88
     cheated
    0.88
    Act Density 0.234%

    No Known Activations