INDEX
Explanations
mentions of the American population or specific demographics within America
mentions of "Americans" and "Canadians."
New Auto-Interp
Negative Logits
Initialized
-0.72
bol
-0.64
cer
-0.64
Guru
-0.63
Drag
-0.62
selection
-0.62
Malaysia
-0.62
Nanto
-0.61
ring
-0.61
efully
-0.61
POSITIVE LOGITS
hip
0.99
'
0.82
tuned
0.80
ourcing
0.79
ugi
0.77
living
0.77
ourced
0.76
distrust
0.74
who
0.74
hips
0.74
Activations Density 0.066%