INDEX
Explanations
identifying male individuals
New Auto-Interp
Negative Logits
ಹೊಂದಿ
0.43
WOMEN
0.42
╞
0.42
women
0.41
䟧
0.40
women
0.38
menstrual
0.38
زنان
0.38
amén
0.38
Jij
0.38
POSITIVE LOGITS
guy
3.73
guys
3.48
Guy
3.39
Guy
3.38
guy
3.27
guys
3.22
Guys
3.16
Guys
3.03
家伙
2.17
dudes
2.13
Activations Density 0.020%