INDEX
    Explanations

    specific patterns or suffixes in words related to gender, social roles, or categories

    New Auto-Interp
    Negative Logits
     SCIP
    -0.17
     Ashe
    -0.16
    igit
    -0.16
     PST
    -0.16
    ASH
    -0.16
     Sting
    -0.15
    aterno
    -0.15
     Bast
    -0.14
    ISK
    -0.14
    .SimpleButton
    -0.14
    POSITIVE LOGITS
    SS
    0.63
     ss
    0.61
    ss
    0.60
     SS
    0.59
    _ss
    0.54
    .ss
    0.51
    еÑģÑģ
    0.51
    (ss
    0.51
    ess
    0.50
    :ss
    0.49
    Act Density 0.133%

    No Known Activations