INDEX
    Explanations

    pronouns indicating self-directed actions, particularly emphasizing belief in oneself

    New Auto-Interp
    Negative Logits
     Mub
    -0.63
    microsoft
    -0.62
    onal
    -0.62
     Nou
    -0.62
     Lens
    -0.60
    itty
    -0.59
    ency
    -0.58
    cru
    -0.57
     Alger
    -0.57
    grade
    -0.57
    POSITIVE LOGITS
    ortium
    0.71
    ãģ¾
    0.71
    ãģı
    0.70
    ãĤĭ
    0.70
    ãĥķ
    0.70
    é¾įåĸļ士
    0.70
     sanct
    0.68
    ãĥĥãĥĪ
    0.67
    ãģį
    0.66
    çīĪ
    0.66
    Act Density 0.040%

    No Known Activations