INDEX
    Explanations

    nationalities and languages

    New Auto-Interp
    Negative Logits
     
    0.84
     eventuali
    0.76
    0.72
     một
    0.71
     एक
    0.67
     einem
    0.66
     eine
    0.64
     अन्य
    0.64
     એક
    0.62
    <unused2128>
    0.62
    POSITIVE LOGITS
    y
    1.14
    el
    1.06
    al
    1.04
    an
    1.03
    on
    1.02
    as
    1.02
    ol
    1.01
    us
    1.00
    u
    1.00
    ar
    0.98
    Act Density 1.669%

    No Known Activations