INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Probe
    -0.07
     Wer
    -0.06
    Sl
    -0.06
    -0.06
    ain
    -0.06
     Uganda
    -0.06
     کودکان
    -0.06
     hygiene
    -0.06
    Walker
    -0.06
    beam
    -0.06
    POSITIVE LOGITS
    .Xna
    0.07
    flip
    0.07
     sto
    0.06
     kuruluş
    0.06
     repell
    0.06
    getApplication
    0.06
    0.06
    _inter
    0.06
    ength
    0.06
     childish
    0.06
    Act Density 0.001%

    No Known Activations