INDEX
Explanations
marginalized communities and identities
New Auto-Interp
Negative Logits
बल्लेबाज
0.41
Businessman
0.40
Frenchman
0.40
Dutchman
0.40
войска
0.40
sportsman
0.39
पुत्र
0.39
housewife
0.38
Mummy
0.38
স্বামীর
0.37
POSITIVE LOGITS
BIP
1.15
marginalized
1.04
POC
1.03
POC
0.98
marginalised
0.91
cis
0.88
fol
0.82
neuro
0.81
poc
0.80
bip
0.79
Activations Density 0.098%