INDEX
Explanations
words related to attitudes or behavior
references to attitudes or behaviors
New Auto-Interp
Negative Logits
enegger
-0.89
icles
-0.86
dry
-0.76
oval
-0.72
icle
-0.69
idden
-0.67
aban
-0.67
Interstitial
-0.66
rake
-0.66
icular
-0.66
POSITIVE LOGITS
toward
1.50
towards
1.41
attitude
1.15
Towards
1.12
attitudes
0.95
Tow
0.94
stance
0.79
uation
0.75
demeanor
0.73
hostility
0.72
Activations Density 0.039%