INDEX
    Explanations

    phrases related to specific professions or characteristics describing people

    words that denote various social roles and identities

    New Auto-Interp
    Negative Logits
    tails
    -0.74
    izens
    -0.71
     regions
    -0.70
    Us
    -0.70
    rams
    -0.69
     Lans
    -0.68
    ouls
    -0.67
     slopes
    -0.67
    olas
    -0.66
     bins
    -0.66
    POSITIVE LOGITS
    digy
    0.79
     unto
    0.76
    iste
    0.75
    ess
    0.74
    chuk
    0.72
     herself
    0.72
    smith
    0.71
    alyst
    0.70
     myself
    0.70
    nik
    0.69
    Act Density 0.278%

    No Known Activations