INDEX
    Explanations

    references to individual items and their unique characteristics

    New Auto-Interp
    Negative Logits
     weren
    -0.17
    ัย
    -0.16
    ä¸ĢåĪĩ
    -0.16
    avail
    -0.16
    swick
    -0.15
     tidak
    -0.15
     frequently
    -0.15
     вообÑīе
    -0.15
    okus
    -0.15
     doesn
    -0.15
    POSITIVE LOGITS
     unique
    0.32
    unique
    0.28
     Unique
    0.26
     uniqueness
    0.26
     differently
    0.26
     respective
    0.25
    Unique
    0.25
     individually
    0.25
    .unique
    0.24
     respectively
    0.24
    Act Density 0.238%

    No Known Activations