INDEX
    Explanations

    phrases related to people and social interactions

    references to various groups of people, often in negative or stereotypical contexts

    New Auto-Interp
    Negative Logits
     Pyr
    -0.52
     Cur
    -0.49
    IER
    -0.48
     Byr
    -0.48
    Prim
    -0.47
     Grind
    -0.47
    igor
    -0.47
     incumbent
    -0.47
     Vulcan
    -0.47
     Ranger
    -0.46
    POSITIVE LOGITS
     rejoice
    0.85
     unite
    0.76
    hate
    0.75
     beware
    0.74
     sue
    0.73
     adore
    0.73
     prefer
    0.71
     disapprove
    0.70
     dont
    0.70
    paces
    0.69
    Act Density 0.204%

    No Known Activations