INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     LO
    -0.07
    SEMB
    -0.06
    Anonymous
    -0.06
     OMIT
    -0.06
    -0.06
     pornography
    -0.06
     butter
    -0.06
     CZ
    -0.06
     сейчас
    -0.06
     componentDid
    -0.06
    POSITIVE LOGITS
     друж
    0.07
    (fr
    0.07
    [random
    0.07
    (fd
    0.07
    فات
    0.07
    비아
    0.06
     dreaded
    0.06
     jub
    0.06
    Psi
    0.06
     qualitative
    0.06
    Act Density 0.025%

    No Known Activations