INDEX
    Explanations

    references to demographic topics or terms related to identity and social categorization

    New Auto-Interp
    Negative Logits
    vet
    -0.16
    lied
    -0.16
    achs
    -0.16
    iously
    -0.15
    elib
    -0.14
     ogs
    -0.14
    度
    -0.14
    edback
    -0.14
    ificant
    -0.14
    bd
    -0.14
    POSITIVE LOGITS
     dem
    0.23
     Dem
    0.21
    Dem
    0.18
    dem
    0.17
     DEM
    0.17
    urge
    0.16
    meni
    0.15
    214
    0.15
    ographics
    0.15
    stration
    0.15
    Act Density 0.015%

    No Known Activations