INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     genio
    -1.23
    -1.15
     dente
    -1.14
     sirena
    -1.13
    Langkah
    -1.13
    -1.11
     داشتند
    -1.11
     sacerd
    -1.09
    -1.08
    Minden
    -1.06
    POSITIVE LOGITS
    ,
    1.06
    '
    0.99
      
    0.96
    H
    0.92
       
    0.91
    fi
    0.91
             
    0.88
    ерез
    0.88
    ?
    0.88
    0.87
    Act Density 0.043%

    No Known Activations