INDEX
    Explanations

    Hebrew and related languages

    New Auto-Interp
    Negative Logits
     Políticas
    1.18
    على
    1.01
     médioc
    0.93
     políticas
    0.92
     đẩy
    0.91
     inglés
    0.89
     Sénégal
    0.89
     avoir
    0.89
    ا
    0.88
     sofá
    0.88
    POSITIVE LOGITS
    Bers
    0.86
    !’
    0.77
    к
    0.74
    原因是
    0.73
    ?”
    0.73
    lles
    0.73
    )’
    0.73
    !”
    0.72
    קים
    0.71
    ק
    0.71
    Act Density 0.002%

    No Known Activations