INDEX
Explanations
mentions of riot-related activities or events
references to riots and related violent events
New Auto-Interp
Negative Logits
metics
-0.72
hran
-0.70
omething
-0.68
DonaldTrump
-0.67
pta
-0.66
hered
-0.66
ĻĤ
-0.65
ULTS
-0.64
mathemat
-0.63
ournal
-0.63
POSITIVE LOGITS
ous
1.07
ers
0.90
ing
0.84
naire
0.84
ously
0.84
tro
0.83
rained
0.81
riot
0.81
osity
0.79
auld
0.78
Activations Density 0.055%