INDEX
    Explanations

    promoting prejudice and discrimination

    New Auto-Interp
    Negative Logits
     femminile
    0.55
     feminine
    0.54
    女性
    0.50
     жі
    0.49
    female
    0.48
    Female
    0.47
     feminina
    0.47
    femin
    0.47
     fémin
    0.46
    ktop
    0.46
    POSITIVE LOGITS
     prejudice
    1.62
     hatred
    1.52
     hate
    1.41
     bigotry
    1.41
     prejudices
    1.40
     prejudiced
    1.38
     discrimination
    1.35
     Prejudice
    1.35
     racism
    1.25
     hateful
    1.25
    Act Density 0.055%

    No Known Activations