INDEX
Explanations
words related to demonization or criticism
references to demonization and related concepts
New Auto-Interp
Negative Logits
ippi
-0.97
RAFT
-0.79
IGH
-0.74
ļéĨĴ
-0.69
sers
-0.69
Ã¥
-0.68
Seah
-0.66
aird
-0.65
ILLE
-0.65
rehensive
-0.65
POSITIVE LOGITS
stration
0.94
iac
0.93
ises
0.87
ised
0.86
izing
0.86
ising
0.86
ized
0.82
ization
0.81
oid
0.81
izes
0.81
Activations Density 0.007%