INDEX
    Explanations

    self-described or proclaimed identities or affiliations

    phrases that refer to self-identifying labels or descriptors

    New Auto-Interp
    Negative Logits
     Shoes
    -0.81
    perature
    -0.81
    isson
    -0.74
    utton
    -0.73
    vertisement
    -0.72
    inson
    -0.71
    von
    -0.71
    reau
    -0.71
    elight
    -0.71
    orrow
    -0.71
    POSITIVE LOGITS
     adherent
    0.88
     believer
    0.81
     caliphate
    0.80
     atheist
    0.76
     millennial
    0.75
     badass
    0.72
     bigot
    0.70
     democratic
    0.70
     socialist
    0.70
     pacif
    0.69
    Act Density 0.060%

    No Known Activations