INDEX
    Explanations

    references to typical characteristics or common occurrences

    New Auto-Interp
    Negative Logits
    ализи
    -0.14
    andy
    -0.14
    latin
    -0.14
    uhan
    -0.14
    بت
    -0.14
    bart
    -0.14
    imbus
    -0.14
    uel
    -0.14
     flo
    -0.13
    eut
    -0.13
    POSITIVE LOGITS
     usual
    0.30
    usual
    0.25
     typical
    0.24
     fare
    0.21
     suspects
    0.21
    typ
    0.20
     traditional
    0.18
     standard
    0.18
    -standard
    0.17
    Typ
    0.17
    Act Density 0.194%

    No Known Activations