INDEX
Explanations
references to the United Nations (UN)
the presence of references to the United Nations
New Auto-Interp
Negative Logits
Feldman
-0.72
conservatism
-0.69
constituency
-0.65
fascism
-0.65
shave
-0.62
goodbye
-0.62
Slate
-0.62
laz
-0.62
everything
-0.61
Sel
-0.61
POSITIVE LOGITS
UN
4.08
UNE
1.91
UN
1.78
un
1.65
uns
1.59
UNCH
1.39
UL
1.39
unt
1.38
UM
1.35
OU
1.34
Activations Density 0.011%