INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ‍♀️
    2.06
     سازی
    1.79
    Какие
    1.76
    പി
    1.75
    и
    1.74
     irony
    1.72
    k
    1.72
     walang
    1.69
    НГ
    1.66
    ipping
    1.66
    POSITIVE LOGITS
     върху
    2.00
    sofar
    1.84
     воздействия
    1.77
     crater
    1.73
     adversely
    1.72
     marquée
    1.71
     decisively
    1.69
    fillStyle
    1.68
    ahiran
    1.63
    лиз
    1.62
    Act Density 0.467%

    No Known Activations