INDEX
Explanations
words related to criticism and strong negative reactions towards specific individuals or groups
negative statements or criticisms directed towards individuals or groups
New Auto-Interp
Negative Logits
archment
-0.68
ILCS
-0.67
hazard
-0.66
ance
-0.66
iac
-0.65
OTE
-0.65
1920
-0.64
hyde
-0.63
ason
-0.63
thing
-0.63
POSITIVE LOGITS
criticism
0.94
criticisms
0.89
suggestions
0.87
critics
0.86
comments
0.80
questioning
0.78
commenters
0.77
critic
0.77
remarks
0.76
accusations
0.73
Activations Density 0.213%