INDEX
Explanations
other protected characteristics
New Auto-Interp
Negative Logits
estoque
0.47
aparel
0.44
steric
0.44
pretzels
0.44
و
0.43
haci
0.42
แต่
0.42
aulas
0.41
be
0.41
wages
0.41
POSITIVE LOGITS
in
0.84
ap
0.73
я
0.69
x
0.65
s
0.64
اک
0.62
h
0.60
at
0.59
в
0.58
up
0.57
Activations Density 0.404%