INDEX
    Explanations

    instances of personal decisions or opinions in text

    New Auto-Interp
    Negative Logits
     themselves
    -0.68
     respectively
    -0.64
    Their
    -0.60
    their
    -0.59
    EMS
    -0.55
     Autob
    -0.55
     alike
    -0.54
     allegedly
    -0.54
     Their
    -0.54
     their
    -0.53
    POSITIVE LOGITS
     myself
    1.54
     my
    0.91
    poke
    0.72
    ograp
    0.64
     personally
    0.61
     fuckin
    0.61
    chair
    0.58
     MY
    0.58
    milo
    0.57
    laughs
    0.56
    Act Density 0.896%

    No Known Activations