INDEX
    Explanations

    personal pronouns referring to individuals or groups, as well as actions performed by those individuals or groups

    New Auto-Interp
    Negative Logits
     increa
    -1.60
     reluct
    -1.60
     fuf
    -1.56
     apprehen
    -1.53
     disagre
    -1.51
     depic
    -1.48
     impra
    -1.48
     emphat
    -1.46
     accla
    -1.43
     suscep
    -1.41
    POSITIVE LOGITS
    selves
    0.87
     herself
    0.87
     himself
    0.87
    self
    0.85
     themselves
    0.85
     yourself
    0.85
     Himself
    0.82
     ourselves
    0.81
    SELF
    0.79
     myself
    0.77
    Act Density 0.110%

    No Known Activations