INDEX
    Explanations

    coefficients

    New Auto-Interp
    Negative Logits
    mk
    -0.09
    નો
    -0.08
    iol
    -0.08
    Overview
    -0.08
    haupt
    -0.08
    gly
    -0.07
     Overview
    -0.07
    interp
    -0.07
    nb
    -0.07
    wner
    -0.07
    POSITIVE LOGITS
     шавад
    0.09
     kisim
    0.08
     supaya
    0.08
     کړل
    0.08
    алам
    0.08
    сақ
    0.08
     պատվ
    0.08
    რდ
    0.08
     карда
    0.08
     ბაზ
    0.08
    Act Density 0.009%

    No Known Activations