INDEX
Explanations
references to violence and human rights violations
New Auto-Interp
Negative Logits
æĪĺäºī
-0.18
wars
-0.17
ixel
-0.16
oreach
-0.16
ippi
-0.15
Wars
-0.15
reate
-0.15
navr
-0.14
endale
-0.14
isse
-0.14
POSITIVE LOGITS
_hdl
0.17
demonstrations
0.16
Jasmine
0.16
protesters
0.16
Gutenberg
0.15
Quy
0.15
demonstrators
0.15
suppress
0.15
street
0.15
barric
0.15
Activations Density 0.088%