INDEX
    Explanations

    proper nouns indicating individuals or organizations

    New Auto-Interp
    Negative Logits
    addir
    -0.19
    Äįet
    -0.17
    AGR
    -0.17
    unos
    -0.17
    abad
    -0.17
    ovat
    -0.16
    ODEV
    -0.16
    adol
    -0.16
    ikit
    -0.16
    heap
    -0.16
    POSITIVE LOGITS
    bs
    0.33
    gs
    0.29
    ps
    0.27
    fs
    0.26
    ng
    0.26
    hs
    0.26
    kses
    0.25
    ff
    0.25
    ck
    0.23
    ds
    0.23
    Act Density 0.025%

    No Known Activations