INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     prisão
    -0.08
     prison
    -0.08
    əd
    -0.08
    -0.07
     Hotel
    -0.07
     troubled
    -0.07
     farklı
    -0.07
    igma
    -0.07
     hospital
    -0.07
    ️⃣
    -0.07
    POSITIVE LOGITS
     تنا
    0.09
    لن
    0.08
     schme
    0.08
    ును
    0.08
     sauf
    0.08
     хорошо
    0.07
     تل
    0.07
     عادة
    0.07
     proceedings
    0.07
     swallow
    0.07
    Act Density 0.001%

    No Known Activations