INDEX
    Explanations

    terms related to gender characteristics and their representations

    feminine and masculine distinctions

    New Auto-Interp
    Negative Logits
    väg
    -0.48
     previs
    -0.47
     Decke
    -0.46
     Anexo
    -0.46
     passage
    -0.45
    Lihat
    -0.44
    -0.44
     stället
    -0.44
     provis
    -0.43
     Hift
    -0.43
    POSITIVE LOGITS
     Feminine
    0.96
     feminine
    0.94
     femininity
    0.81
     feminino
    0.69
     femeninos
    0.69
     feminina
    0.68
     femininas
    0.66
     FEM
    0.65
     femininos
    0.65
     masculine
    0.64
    Act Density 0.005%

    No Known Activations