INDEX
Explanations
negative sentiments or criticisms directed toward specific individuals or groups
New Auto-Interp
Negative Logits
appre
-0.80
circulation
-0.74
exper
-0.71
umb
-0.70
aggregate
-0.69
plur
-0.69
square
-0.68
warranty
-0.67
specialization
-0.66
reproduction
-0.66
POSITIVE LOGITS
era
1.32
appointed
1.31
inspired
1.21
esque
1.18
loving
1.17
induced
1.16
driven
1.15
campaign
1.12
Clinton
1.10
centered
1.09
Activations Density 0.026%