INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ход
    -0.07
     이루
    -0.06
    _sample
    -0.06
     Kou
    -0.06
     FDA
    -0.06
    ;padding
    -0.06
    ittance
    -0.06
    교회
    -0.06
    enny
    -0.06
    etre
    -0.06
    POSITIVE LOGITS
    博士
    0.06
     expr
    0.06
    ним
    0.06
     political
    0.06
     دستور
    0.06
    }↵↵
    0.06
     younger
    0.06
     embarked
    0.06
     Qing
    0.06
     dissatisfaction
    0.06
    Act Density 0.008%

    No Known Activations