INDEX
    Explanations

    you want like

    New Auto-Interp
    Negative Logits
    _bg
    -0.07
    amil
    -0.06
    844
    -0.06
    ‌دهد
    -0.06
     Burst
    -0.06
    álo
    -0.06
    (INFO
    -0.06
    лены
    -0.06
     Tops
    -0.06
     threat
    -0.06
    POSITIVE LOGITS
     stationary
    0.07
    coli
    0.06
    pared
    0.06
     yg
    0.06
     comando
    0.06
    0.06
     lum
    0.06
    EEK
    0.06
     произ
    0.06
     components
    0.06
    Act Density 0.029%

    No Known Activations