INDEX
Explanations
instances of conflict or protest-related language
New Auto-Interp
Negative Logits
cono
-0.16
imest
-0.16
opp
-0.16
ennon
-0.15
ContentAlignment
-0.15
artner
-0.15
imer
-0.15
richest
-0.15
isto
-0.15
emouth
-0.14
POSITIVE LOGITS
anford
0.17
Avg
0.16
onia
0.16
aked
0.15
AGED
0.15
aged
0.15
singular
0.15
sage
0.14
ؤاÙĦ
0.14
wig
0.14
Activations Density 0.093%