INDEX
    Explanations

    names ending in 'i'

    the pronoun "I" and, relatedly, references to self or identity

    New Auto-Interp
    Negative Logits
    pter
    -0.77
     ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
    -0.71
    lisher
    -0.70
    cffff
    -0.69
    eatures
    -0.68
    imentary
    -0.67
    */(
    -0.65
    ilater
    -0.65
    sburg
    -0.65
    lain
    -0.64
    POSITIVE LOGITS
    Äĩ
    1.13
    orno
    1.09
    plom
    1.00
    ples
    1.00
    ère
    0.99
    ye
    0.98
    ota
    0.96
    oti
    0.96
    pling
    0.93
    uli
    0.92
    Act Density 0.056%

    No Known Activations