INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    =d
    -0.08
    784
    -0.07
    оты
    -0.07
    _counts
    -0.06
    .radians
    -0.06
     حالت
    -0.06
    ]*
    -0.06
    untary
    -0.06
     актив
    -0.06
     суд
    -0.06
    POSITIVE LOGITS
    ish
    0.12
    ASH
    0.11
    ash
    0.11
    ISH
    0.11
    ush
    0.11
    sh
    0.10
     Ash
    0.10
    osh
    0.09
    Ash
    0.09
    SH
    0.09
    Act Density 0.126%

    No Known Activations