INDEX
Explanations
political criticism targeting specific individuals or groups
phrases that involve criticism or strong disapproval of entities and individuals, particularly in the political context
New Auto-Interp
Negative Logits
Prest
-0.71
Caldwell
-0.67
llular
-0.65
along
-0.63
Approximately
-0.62
aea
-0.62
erves
-0.61
Seah
-0.60
Med
-0.60
VILLE
-0.59
POSITIVE LOGITS
hypocrisy
1.05
inaction
0.97
coward
0.95
unfair
0.92
motives
0.91
sexist
0.90
arrogance
0.89
inconsistency
0.89
bigotry
0.89
perceived
0.88
Activations Density 0.456%