INDEX
Explanations
words related to negativity and adverse effects
phrases associated with negative impact or perceptions
New Auto-Interp
Negative Logits
plain
-0.93
ource
-0.84
Hide
-0.84
ablished
-0.78
here
-0.78
nington
-0.77
racuse
-0.76
ruff
-0.75
REDACTED
-0.74
DOM
-0.74
POSITIVE LOGITS
consequences
1.21
impact
1.15
impacts
1.13
repercussions
1.12
publicity
1.11
effects
1.06
feedback
1.06
affect
1.05
gearing
1.05
stereotypes
1.05
Activations Density 0.045%