INDEX
    Explanations

    proper nouns, specifically names related to people and characters

    New Auto-Interp
    Negative Logits
     Shade
    -0.16
    ixin
    -0.15
    331
    -0.15
    ãĥªãĤ¹
    -0.14
    bris
    -0.14
     shade
    -0.14
    Ñıб
    -0.14
    back
    -0.14
    illus
    -0.13
    ngr
    -0.13
    POSITIVE LOGITS
    inke
    0.17
    dü
    0.15
    ald
    0.15
    ãĥ³ãĤ¯
    0.15
    asz
    0.14
    ุà¸ĩ
    0.14
    Ïĥε
    0.14
    yk
    0.14
    isci
    0.14
    atrix
    0.13
    Act Density 0.028%

    No Known Activations