INDEX
Explanations
terms related to gender characteristics and their representations
feminine and masculine distinctions
New Auto-Interp
Negative Logits
väg
-0.48
previs
-0.47
Decke
-0.46
Anexo
-0.46
passage
-0.45
Lihat
-0.44
梗
-0.44
stället
-0.44
provis
-0.43
Hift
-0.43
POSITIVE LOGITS
Feminine
0.96
feminine
0.94
femininity
0.81
feminino
0.69
femeninos
0.69
feminina
0.68
femininas
0.66
FEM
0.65
femininos
0.65
masculine
0.64
Activations Density 0.005%