INDEX
Explanations
terms related to discrimination and civil rights violations
New Auto-Interp
Negative Logits
lify
-0.17
aho
-0.15
ift
-0.15
oram
-0.15
ilda
-0.14
cn
-0.14
_SIGNATURE
-0.14
.ManyToMany
-0.14
ough
-0.14
esty
-0.14
POSITIVE LOGITS
against
0.18
against
0.18
Against
0.18
Against
0.15
taste
0.15
zew
0.15
rzy
0.15
towards
0.14
hiring
0.14
ellen
0.14
Activations Density 0.028%