INDEX
Explanations
marginalized groups and populations
New Auto-Interp
Negative Logits
uomini
0.74
monde
0.71
hombres
0.71
sincron
0.68
attitudes
0.68
linear
0.68
Wilde
0.64
synchron
0.64
➚
0.63
hype
0.63
POSITIVE LOGITS
minorities
1.05
minority
1.05
vulnerable
0.97
Minority
0.93
Vulner
0.90
ulnerable
0.88
marginalized
0.88
disadvantaged
0.87
弱
0.87
marginalised
0.86
Activations Density 0.233%