INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Gender
    -0.07
    ACKET
    -0.07
     gloves
    -0.06
    aupt
    -0.06
     corpus
    -0.06
     Gloves
    -0.06
     silver
    -0.06
     نبود
    -0.06
     safeguard
    -0.06
    ()>↵
    -0.06
    POSITIVE LOGITS
     Lebanese
    0.07
    ylene
    0.07
     anzeigen
    0.07
    ilarity
    0.07
    Wunused
    0.07
    aciones
    0.07
    bnb
    0.06
     Almanya
    0.06
     hormonal
    0.06
    _To
    0.06
    Act Density 0.002%

    No Known Activations