INDEX
Explanations
references to countries and governmental bodies, particularly the U.S
mentions of the U.S. government and related entities
New Auto-Interp
Negative Logits
eleph
-0.86
bending
-0.70
bet
-0.69
oba
-0.64
horizont
-0.62
bunny
-0.62
aber
-0.62
mosqu
-0.60
bda
-0.59
ahime
-0.59
POSITIVE LOGITS
S
1.30
Va
0.93
Y
0.91
K
0.86
¥
0.86
N
0.84
KS
0.84
Ds
0.83
ª
0.83
American
0.80
Activations Density 0.049%