INDEX
    Explanations

    mentions of different groups of people

    references to groups of people

    New Auto-Interp
    Negative Logits
    tains
    -0.88
    opens
    -0.71
    iHUD
    -0.70
    ¿½
    -0.68
    Increases
    -0.68
    manent
    -0.67
     Flavoring
    -0.67
    CONT
    -0.65
    forestation
    -0.63
     Prel
    -0.63
    POSITIVE LOGITS
     aren
    1.57
     deserve
    1.55
     ARE
    1.36
     weren
    1.33
     shouldn
    1.32
     are
    1.32
     despise
    1.30
     don
    1.29
     ain
    1.28
     suck
    1.28
    Act Density 0.421%

    No Known Activations