INDEX
Explanations
references to protests or protest-related activities
references to protests or acts of demonstration
New Auto-Interp
Negative Logits
nown
-0.79
theless
-0.78
illac
-0.74
efficients
-0.72
oiler
-0.70
ccording
-0.69
ewater
-0.69
ubi
-0.67
paste
-0.67
estial
-0.65
POSITIVE LOGITS
encamp
0.87
ations
0.87
protestors
0.79
protesting
0.78
aires
0.77
rained
0.76
protest
0.76
atos
0.74
ors
0.74
istas
0.74
Activations Density 0.036%