INDEX
Explanations
phrases related to demonizing or criticizing individuals or groups
terms associated with demonization and stigma
New Auto-Interp
Negative Logits
ippi
-0.85
RAFT
-0.78
IGH
-0.74
ļéĨĴ
-0.71
jri
-0.69
Soda
-0.68
Seym
-0.67
orship
-0.67
Seah
-0.65
proble
-0.65
POSITIVE LOGITS
stration
1.05
iac
1.01
ises
0.96
ising
0.96
ised
0.93
izing
0.92
oid
0.91
izes
0.90
oids
0.87
ormal
0.85
Activations Density 0.008%