INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Đi
    -0.08
     સમાવ
    -0.08
    Announcement
    -0.08
    Trat
    -0.08
     senators
    -0.07
    Disney
    -0.07
    рив
    -0.07
    Adres
    -0.07
     senator
    -0.07
    vog
    -0.07
    POSITIVE LOGITS
     benutzen
    0.09
     doigts
    0.08
     fingers
    0.08
     Thomas
    0.08
     Diy
    0.07
     Crafts
    0.07
     воды
    0.07
    utils
    0.07
     หลัง
    0.07
     цветов
    0.07
    Act Density 0.013%

    No Known Activations