INDEX
    Explanations

    countries and nationalities

    New Auto-Interp
    Negative Logits
    ur
    0.43
    ul
    0.42
    i
    0.39
    el
    0.39
    u
    0.38
    end
    0.38
    id
    0.37
     Hollywood
    0.37
    ib
    0.36
    v
    0.36
    POSITIVE LOGITS
     נ
    0.43
    jší
    0.43
     penso
    0.43
     ב
    0.42
     hacemos
    0.42
    ואה
    0.41
    geschichte
    0.40
     പൊതു
    0.40
    0.40
     complicate
    0.40
    Act Density 0.099%

    No Known Activations