INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    t
    1.34
    l
    1.01
    r
    0.95
    ation
    0.94
    б
    0.93
    ون
    0.92
    ik
    0.91
    im
    0.88
    ell
    0.86
    ov
    0.86
    POSITIVE LOGITS
     relieved
    1.00
    ما
    0.96
     relieves
    0.94
    дка
    0.91
    ње
    0.91
    قي
    0.86
    ме
    0.83
     pampered
    0.82
    اد
    0.81
    0.80
    Act Density 0.002%

    No Known Activations