INDEX
    Explanations

    words associated with masculine and feminine characteristics

    New Auto-Interp
    Negative Logits
    aneers
    -0.77
    stall
    -0.76
    nl
    -0.69
    tein
    -0.69
    pan
    -0.65
    umblr
    -0.64
    OUT
    -0.64
    ettel
    -0.63
    zan
    -0.62
    ciples
    -0.62
    POSITIVE LOGITS
    atively
    0.91
    ively
    0.88
    ativity
    0.83
     affili
    0.73
    ative
    0.72
     associations
    0.71
    enza
    0.69
    iated
    0.69
    hips
    0.68
    ãĥł
    0.68
    Act Density 0.073%

    No Known Activations