INDEX
Explanations
countries or regions
references to specific countries and their actions or statuses
New Auto-Interp
Negative Logits
uces
-0.79
bum
-0.68
Kills
-0.63
getic
-0.63
grades
-0.63
Writ
-0.63
ipl
-0.62
Translation
-0.62
reads
-0.62
Lear
-0.61
POSITIVE LOGITS
have
1.32
are
1.29
owe
1.17
intend
1.13
contend
1.12
rely
1.12
boast
1.10
refuse
1.09
operate
1.08
insist
1.08
Activations Density 0.331%