INDEX
Explanations
mentions of specific regions or locations
geographical and demographic references in the context of conflict or social issues
New Auto-Interp
Negative Logits
uted
-0.59
imens
-0.56
ibur
-0.54
iste
-0.52
Doctors
-0.51
onite
-0.51
Materials
-0.50
mand
-0.49
TPP
-0.49
laughs
-0.48
POSITIVE LOGITS
&
1.28
/
1.16
/
1.11
&
1.09
terday
0.96
or
0.87
AND
0.86
/$
0.80
and
0.79
/_
0.77
Activations Density 0.928%