INDEX
    Explanations

    references to the concept of "none" or "nothingness."

    New Auto-Interp
    Negative Logits
    roc
    -0.20
    ÑģÑĤÑİ
    -0.16
    enga
    -0.15
    shint
    -0.15
    گاÙĩÛĮ
    -0.14
    ãĤ¥
    -0.14
    nga
    -0.14
    ENCES
    -0.14
    dependent
    -0.14
    riority
    -0.14
    POSITIVE LOGITS
    none
    0.24
    theless
    0.23
    -too
    0.21
    -the
    0.21
     other
    0.20
     NONE
    0.20
    NONE
    0.20
    -none
    0.20
    /all
    0.20
     none
    0.20
    Act Density 0.011%

    No Known Activations