INDEX
    Explanations

    multiple languages

    New Auto-Interp
    Negative Logits
     ا
    -0.08
    sp
    -0.08
    spy
    -0.07
    stos
    -0.07
     =$
    -0.07
    .scalar
    -0.07
    628
    -0.07
     תודה
    -0.07
    uluka
    -0.07
    sl
    -0.07
    POSITIVE LOGITS
    Â
    0.10
    â
    0.09
     rero
    0.08
     particolare
    0.08
    Ã
    0.08
     voici
    0.08
    ريون
    0.07
     Â
    0.07
     inderdaad
    0.07
    ંગ્રેસ
    0.07
    Act Density 0.155%

    No Known Activations