INDEX
    Explanations

    personal pronouns followed by a statement

    New Auto-Interp
    Negative Logits
    iquette
    -0.62
     toget
    -0.61
    ãĤ¼
    -0.60
     Hels
    -0.60
    Redditor
    -0.59
    eatures
    -0.59
    redients
    -0.57
     è£ıè
    -0.56
     Abyss
    -0.55
    pires
    -0.54
    POSITIVE LOGITS
     think
    1.42
    'm
    1.37
     mean
    1.25
    've
    1.18
     guess
    1.17
     don
    1.16
     suppose
    1.05
     dunno
    1.04
    'd
    1.02
     wouldn
    1.02
    Act Density 0.177%

    No Known Activations