INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ли
    0.98
     
    0.96
    rinos
    0.95
    er
    0.90
     polych
    0.90
     EP
    0.89
     Pleasure
    0.88
    рован
    0.87
     plaques
    0.87
     Concerns
    0.87
    POSITIVE LOGITS
    speople
    1.26
    ت
    1.22
    sière
    1.20
    1.19
    கரு
    1.18
     гуляць
    1.16
    Tidak
    1.16
     délais
    1.16
    میان
    1.16
    miktar
    1.14
    Act Density 0.000%

    No Known Activations