INDEX
Explanations
mentions of the United States (U.S.) and its actions in various international contexts
mentions of the United States
New Auto-Interp
Negative Logits
Noir
-0.76
Wicked
-0.75
Bach
-0.69
CPC
-0.69
Cinderella
-0.66
juggling
-0.66
Preferred
-0.64
bars
-0.63
ãĥ¯ãĥ³
-0.62
Referred
-0.62
POSITIVE LOGITS
prising
1.22
nexpected
1.13
seless
0.99
lyss
0.98
gly
0.96
NA
0.93
prise
0.92
zbek
0.92
rien
0.92
DF
0.92
Activations Density 0.054%