INDEX
Explanations
phrases related to discrimination and equality
New Auto-Interp
Negative Logits
ÑĤин
-0.08
amil
-0.07
žit
-0.07
ingroup
-0.07
ojÃŃ
-0.07
eyi
-0.07
женÑĮ
-0.07
PÅĻi
-0.07
모ëijIJ
-0.07
Rosenstein
-0.07
POSITIVE LOGITS
or
0.07
ado
0.06
whether
0.06
æĺ¯åIJ¦
0.06
alone
0.06
lack
0.06
conscience
0.06
merely
0.06
perceived
0.06
observ
0.05
Activations Density 0.009%