INDEX
    Explanations

    personal pronouns followed by a verb

    first-person pronouns and expressions of personal experience

    New Auto-Interp
    Negative Logits
    ussen
    -0.78
    ãĥĬ
    -0.76
    isite
    -0.72
    aughtered
    -0.71
    phans
    -0.71
    asma
    -0.69
     guiActiveUn
    -0.69
    etheus
    -0.69
    ²¾
    -0.67
    ENDED
    -0.67
    POSITIVE LOGITS
    'm
    0.79
     exagger
    0.68
    RL
    0.67
     coer
    0.66
    ufact
    0.66
    tub
    0.66
     flirt
    0.65
     tatt
    0.64
    've
    0.64
    ñ
    0.63
    Act Density 0.276%

    No Known Activations