INDEX
    Explanations

    personal pronouns and the word "you"

    New Auto-Interp
    Negative Logits
     Gamb
    -0.64
     Filip
    -0.64
     Kang
    -0.64
     Pratt
    -0.64
    images
    -0.63
     Patt
    -0.63
     Kaine
    -0.62
    entimes
    -0.62
     Canaver
    -0.60
     Lau
    -0.59
    POSITIVE LOGITS
    're
    1.48
    've
    1.28
    'll
    1.14
    RS
    1.03
    tub
    1.01
    'd
    0.96
    hei
    0.87
    re
    0.85
     guys
    0.82
    tu
    0.82
    Act Density 0.193%

    No Known Activations