INDEX
    Explanations

    mentions of famous personalities and political figures along with negative associations

    New Auto-Interp
    Negative Logits
    eem
    -0.71
     Apart
    -0.66
     behold
    -0.60
    Interested
    -0.60
     Pair
    -0.58
    icking
    -0.57
    etting
    -0.55
    TG
    -0.55
    hand
    -0.55
    ove
    -0.54
    POSITIVE LOGITS
     been
    1.70
    been
    1.44
     undergone
    1.30
     gotten
    1.25
     become
    1.25
     begun
    1.22
     Been
    1.15
     risen
    1.12
     gone
    1.12
     arisen
    1.07
    Act Density 0.803%

    No Known Activations