INDEX
Explanations
references to marginalized groups and discussions regarding systemic inequality
New Auto-Interp
Negative Logits
people
-0.64
someone
-0.63
someone
-0.62
individuals
-0.62
itself
-0.60
somebody
-0.60
Itself
-0.59
somebody
-0.58
mensen
-0.58
itself
-0.58
POSITIVE LOGITS
whom
1.03
whom
0.82
whose
0.62
whose
0.58
Whom
0.57
who
0.57
Whom
0.53
wheelchairs
0.52
الذين
0.51
who
0.50
Activations Density 0.635%