INDEX
    Explanations

    descriptions of actions or suggestions related to helping or aiding others

    pronouns and actions related to potential or capability

    New Auto-Interp
    Negative Logits
     sqor
    -0.78
     millenn
    -0.65
     partName
    -0.64
     Cosponsors
    -0.62
     attm
    -0.59
     Arri
    -0.58
     fraught
    -0.58
     treacher
    -0.58
     lately
    -0.57
     Nay
    -0.57
    POSITIVE LOGITS
     can
    1.13
     wouldn
    1.03
     could
    1.02
     wont
    1.02
    'll
    0.91
     dont
    0.91
     won
    0.85
    sylv
    0.84
     avoids
    0.80
     maxim
    0.79
    Act Density 0.154%

    No Known Activations