INDEX
    Explanations

    pronouns referring to oneself or themselves

    reflexive pronouns and phrases related to self-reference

    New Auto-Interp
    Negative Logits
    onal
    -0.64
     Mub
    -0.63
    cru
    -0.61
    iens
    -0.60
    aptic
    -0.60
    microsoft
    -0.60
    itty
    -0.59
    grade
    -0.59
     Alger
    -0.58
    emis
    -0.57
    POSITIVE LOGITS
    é¾įåĸļ士
    0.75
    ãĥĥãĥĪ
    0.74
    ãĥķ
    0.73
    ãĤĭ
    0.71
    ãģ¾
    0.70
    ãģı
    0.69
    è»
    0.68
    çīĪ
    0.67
    åĤ
    0.66
    ãĥĹ
    0.65
    Act Density 0.042%

    No Known Activations