INDEX
Explanations
words related to attitudes
references to societal attitudes and perceptions
New Auto-Interp
Negative Logits
bid
-0.78
gran
-0.75
ded
-0.74
MER
-0.74
avez
-0.73
addafi
-0.71
aman
-0.70
amaz
-0.69
icles
-0.68
amen
-0.68
POSITIVE LOGITS
attitudes
1.27
attitude
0.93
guiActiveUn
0.92
toward
0.80
insula
0.79
incent
0.79
ocial
0.79
towards
0.79
yip
0.79
terness
0.77
Activations Density 0.007%