INDEX
Explanations
words related to negative or harmful actions towards individuals or groups
New Auto-Interp
Negative Logits
istrates
-0.74
çīĪ
-0.73
GOODMAN
-0.67
giveaway
-0.67
代
-0.66
HCR
-0.65
spot
-0.63
nesday
-0.62
åĮ
-0.61
mits
-0.61
POSITIVE LOGITS
aging
1.87
agement
1.76
aged
1.69
ages
1.61
age
1.49
AGE
1.47
AGES
1.27
agers
1.21
agements
1.15
ager
1.13
Activations Density 0.063%