INDEX
    Explanations

    expressions related to demographic categories such as age, income, race, gender, and background

    New Auto-Interp
    Negative Logits
    tein
    -0.80
    rence
    -0.77
    gotten
    -0.71
    ãĥį
    -0.67
    ãģį
    -0.66
    owicz
    -0.66
    prison
    -0.64
    enko
    -0.64
    Tro
    -0.63
    ש
    -0.63
    POSITIVE LOGITS
     imaginable
    1.17
     ranging
    1.04
    etting
    1.04
     vying
    0.97
     guiActiveUnfocused
    0.89
     depending
    0.88
     simultaneously
    0.87
    paces
    0.85
     differing
    0.82
    cale
    0.82
    Act Density 0.304%

    No Known Activations