INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ans
    -0.09
     gaze
    -0.08
    arr
    -0.08
     coff
    -0.08
    анс
    -0.08
    annels
    -0.08
     slowly
    -0.08
     joue
    -0.08
     remuneration
    -0.07
     repaint
    -0.07
    POSITIVE LOGITS
    يم
    0.12
    ِي
    0.09
    واعد
    0.09
    ింత
    0.09
     این
    0.09
    .condition
    0.09
     FLEX
    0.08
    ীৰ
    0.08
    ِل
    0.08
     ea
    0.08
    Act Density 0.001%

    No Known Activations