INDEX
    Explanations

    personal statements or opinions starting with "I"

    sentences that express personal identity or self-reference

    New Auto-Interp
    Negative Logits
    tnc
    -0.68
    tains
    -0.66
     indistinguishable
    -0.57
     Rockefeller
    -0.57
     Gap
    -0.55
     Reverse
    -0.54
    groupon
    -0.54
     excess
    -0.54
    pires
    -0.53
     Philipp
    -0.53
    POSITIVE LOGITS
    'm
    1.45
    've
    1.31
     dunno
    1.22
    'll
    1.22
     suppose
    1.15
    'd
    1.06
    nex
    1.02
     guess
    1.01
    WI
    1.00
     mean
    0.97
    Act Density 0.233%

    No Known Activations