INDEX
    Explanations

    words related to possessive pronouns

    instances of gender-neutral references

    New Auto-Interp
    Negative Logits
    Integ
    -0.66
     prelim
    -0.60
    utsche
    -0.60
    AIR
    -0.60
     vib
    -0.57
    Interstitial
    -0.56
     names
    -0.54
     rap
    -0.54
     clearly
    -0.54
     Names
    -0.53
    POSITIVE LOGITS
    nam
    0.93
    acles
    0.85
    Else
    0.81
    henko
    0.81
    chid
    0.80
    ifice
    0.77
    ilings
    0.76
    acle
    0.74
    ãĥ¼ãĥĨ
    0.74
    acular
    0.73
    Act Density 0.055%

    No Known Activations