INDEX
    Explanations

    words or suffixes related to names and titles

    New Auto-Interp
    Negative Logits
    nd
    -0.26
    line
    -0.21
    rou
    -0.20
    nds
    -0.20
    st
    -0.19
    ness
    -0.19
    nya
    -0.18
    shan
    -0.18
    na
    -0.18
    nde
    -0.17
    POSITIVE LOGITS
    utenant
    0.24
    ght
    0.24
    lectric
    0.23
    =edge
    0.21
    vements
    0.20
    =UTF
    0.20
    zsche
    0.19
    gos
    0.19
    gh
    0.18
    ÃŁen
    0.18
    Act Density 0.077%

    No Known Activations