INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     mammals
    -0.07
     Pant
    -0.07
     respects
    -0.07
    ющего
    -0.06
     conspic
    -0.06
     acept
    -0.06
    ihn
    -0.06
     Virt
    -0.06
    Compat
    -0.06
    alarından
    -0.06
    POSITIVE LOGITS
     sl
    0.08
    (script
    0.08
     Honduras
    0.07
     slag
    0.07
    ServerError
    0.07
     dedic
    0.07
    رش
    0.06
    \E
    0.06
    VRTX
    0.06
    (ret
    0.06
    Act Density 0.002%

    No Known Activations