INDEX
    Explanations

    proper nouns, especially names of authors and characters in literature

    New Auto-Interp
    Negative Logits
    .fast
    -0.14
    .vec
    -0.14
    >ID
    -0.14
    adaki
    -0.14
    گاÙĨ
    -0.14
     ubiquitous
    -0.13
    %X
    -0.13
    otre
    -0.13
    akan
    -0.13
    .failure
    -0.13
    POSITIVE LOGITS
    pty
    0.15
    verter
    0.15
     Commod
    0.15
     assim
    0.15
    setattr
    0.14
     bast
    0.14
    phins
    0.14
     imp
    0.14
    ño
    0.13
    heavy
    0.13
    Act Density 0.050%

    No Known Activations