INDEX
    Explanations

    specific groups of people based on characteristics or affiliations

    phrases related to discrimination and targeted hate speech

    New Auto-Interp
    Negative Logits
    staking
    -0.74
    thumbnails
    -0.74
    flex
    -0.73
    mares
    -0.69
    flows
    -0.69
    Alert
    -0.67
    blocks
    -0.67
    orders
    -0.65
    amar
    -0.65
    ulations
    -0.65
    POSITIVE LOGITS
     particular
    1.23
     person
    1.10
     specific
    1.08
     subset
    1.06
     individual
    1.05
     constituent
    1.00
     deity
    1.00
     entity
    0.97
     piece
    0.96
     perpetrator
    0.95
    Act Density 0.431%

    No Known Activations