INDEX
    Explanations

    adjectives and descriptions related to people and their behaviors

    sociopolitical labels and identities

    New Auto-Interp
    Negative Logits
    tions
    -0.68
     partName
    -0.62
     GOODMAN
    -0.61
    tails
    -0.58
    iland
    -0.57
    ancies
    -0.55
    mentioned
    -0.54
    uyomi
    -0.52
    mares
    -0.52
     waiver
    -0.52
    POSITIVE LOGITS
     inferior
    0.70
     discipl
    0.66
     underdog
    0.65
     unworthy
    0.65
     undermin
    0.62
     achievable
    0.61
     martyr
    0.60
     deserving
    0.60
     savior
    0.59
     superior
    0.58
    Act Density 0.656%

    No Known Activations