INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     +
    -0.07
     =>
    -0.07
     나오
    -0.07
    Periph
    -0.06
     patio
    -0.06
    ($('<
    -0.06
     bravery
    -0.06
     jihad
    -0.06
    -0.06
    ğini
    -0.06
    POSITIVE LOGITS
    -
    0.11
    }-
    0.08
    maması
    0.08
    ‐'
    0.07
    -N
    0.07
     nawet
    0.07
    -x
    0.07
    кими
    0.06
    (stypy
    0.06
    机构
    0.06
    Act Density 0.024%

    No Known Activations