INDEX
    Explanations

    references to gender and pronouns

    pronouns and gendered terms

    New Auto-Interp
    Negative Logits
     Piece
    -0.49
     Pile
    -0.49
    enters
    -0.47
    yip
    -0.47
     Vital
    -0.46
     Pure
    -0.45
    osto
    -0.45
    aneously
    -0.44
    apas
    -0.44
    -0.44
    POSITIVE LOGITS
     pronouns
    0.62
     pronoun
    0.54
    ьаж
    0.48
     &___
    0.47
     Comprometido
    0.44
     orgánico
    0.41
     integridad
    0.41
    Étymologie
    0.41
     humo
    0.40
    astéro
    0.40
    Act Density 0.004%

    No Known Activations