INDEX
Explanations
phrases related to prioritizing specific groups or interests above others, often involving a sense of risk or danger
topics related to American workers, jobs, and interests
New Auto-Interp
Negative Logits
Winged
-0.65
etheless
-0.64
apply
-0.60
oris
-0.55
Changed
-0.52
Evening
-0.52
Scrolls
-0.51
Subtle
-0.51
Same
-0.51
dict
-0.50
POSITIVE LOGITS
squarely
1.24
ahead
0.95
firmly
0.87
into
0.83
closer
0.81
overboard
0.81
upside
0.79
atop
0.78
smack
0.77
onto
0.77
Activations Density 0.120%