INDEX
    Explanations

    enough margin to evaluate

    New Auto-Interp
    Negative Logits
    endas
    0.73
     außerdem
    0.71
     Nakh
    0.71
    Очень
    0.70
    ñar
    0.67
     Variable
    0.66
    েও
    0.66
     albo
    0.65
     πλή
    0.65
    ة
    0.64
    POSITIVE LOGITS
    டக்கலை
    1.08
    ને
    1.03
    0.96
    یا
    0.96
    0.93
    ట్టు
    0.92
    ни
    0.91
    тор
    0.91
     disinterested
    0.91
     начинают
    0.90
    Act Density 0.000%

    No Known Activations