INDEX
    Explanations

    phrases that emphasize the use of the first-person pronoun "I."

    New Auto-Interp
    Negative Logits
    lopen
    -0.08
    bilt
    -0.08
    áp
    -0.08
     itself
    -0.07
    bidden
    -0.07
    ارس
    -0.07
    gether
    -0.07
    ousand
    -0.07
    lx
    -0.07
    ENAME
    -0.07
    POSITIVE LOGITS
    ’m
    0.11
    'm
    0.11
    've
    0.10
    ’ve
    0.09
     myself
    0.09
     am
    0.09
    zzo
    0.09
    'll
    0.09
     бÑĥдÑĥ
    0.08
    zelf
    0.08
    Act Density 0.115%

    No Known Activations