INDEX
    Explanations

    mentions of riot-related activities or events

    references to riots and related violent events

    New Auto-Interp
    Negative Logits
    metics
    -0.72
    hran
    -0.70
    omething
    -0.68
    DonaldTrump
    -0.67
    pta
    -0.66
    hered
    -0.66
    ĻĤ
    -0.65
    ULTS
    -0.64
     mathemat
    -0.63
    ournal
    -0.63
    POSITIVE LOGITS
    ous
    1.07
    ers
    0.90
    ing
    0.84
    naire
    0.84
    ously
    0.84
    tro
    0.83
    rained
    0.81
     riot
    0.81
    osity
    0.79
    auld
    0.78
    Act Density 0.055%

    No Known Activations