INDEX
    Explanations

    proper nouns, particularly names and organizations

    New Auto-Interp
    Negative Logits
    ephir
    -0.16
    _slow
    -0.15
    earch
    -0.15
    Slow
    -0.15
    emark
    -0.15
    ashed
    -0.14
    æŀĿ
    -0.14
    मन
    -0.14
    lub
    -0.14
    opis
    -0.14
    POSITIVE LOGITS
    ki
    0.24
    en
    0.23
    ky
    0.22
    ka
    0.18
    song
    0.17
    ons
    0.17
    yah
    0.17
    ens
    0.16
    zc
    0.15
    hte
    0.15
    Act Density 0.065%

    No Known Activations