INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ماش
    -0.08
    -0.08
    ميز
    -0.07
    -0.07
    Rejected
    -0.07
    ане
    -0.07
    Destroyed
    -0.07
    бор
    -0.07
    -0.07
    -0.07
    POSITIVE LOGITS
     eateries
    0.08
    onts
    0.08
    itäten
    0.08
     Alzheimer's
    0.08
     ganas
    0.08
     baixos
    0.07
    lak
    0.07
    మ్మ
    0.07
    ా�
    0.07
     Kensington
    0.07
    Act Density 0.001%

    No Known Activations