INDEX
Explanations
references related to conflict, contentious incidents, or human rights violations
phrases indicating actions or violations related to invasion and oppression
New Auto-Interp
Negative Logits
partName
-0.82
istg
-0.71
ancest
-0.71
cano
-0.65
soType
-0.64
bet
-0.64
ringe
-0.64
aldo
-0.61
eus
-0.61
fters
-0.60
POSITIVE LOGITS
sorts
0.76
fossil
0.68
Goods
0.67
course
0.67
Trafford
0.66
civilians
0.64
wildlife
0.63
ARS
0.62
minorities
0.59
Warfare
0.59
Activations Density 0.169%