INDEX
    Explanations

    occurrences of the word "I" and related personal pronouns indicating self-reference

    New Auto-Interp
    Negative Logits
    igo
    -0.17
    让æĪij
    -0.16
     myself
    -0.15
    ãĥ¼ãĥĢ
    -0.14
    ceased
    -0.14
    ynet
    -0.14
    ’da
    -0.13
    ’na
    -0.13
    ι
    -0.13
     Admir
    -0.13
    POSITIVE LOGITS
     plan
    0.23
    plan
    0.20
     think
    0.19
     haven
    0.18
    Think
    0.17
    may
    0.17
     figure
    0.17
     finally
    0.16
     hope
    0.16
    iswa
    0.16
    Act Density 0.229%

    No Known Activations