INDEX
Explanations
references to the United States military and international affairs
the presence of the abbreviation "U.S." or references to the United States
New Auto-Interp
Negative Logits
Attribution
-0.65
izen
-0.65
ãĥ¼ãĥĨãĤ£
-0.58
ax
-0.57
ãĥ¢
-0.56
amaz
-0.56
proceeds
-0.55
HF
-0.55
illin
-0.55
{"-0.54
POSITIVE LOGITS
S
1.06
Nations
0.90
$.
0.86
Va
0.83
Soccer
0.81
N
0.78
CLASSIFIED
0.76
NAT
0.76
States
0.70
K
0.70
Activations Density 0.050%