INDEX
Explanations
phrases related to societal expectations, especially regarding appearance and gender
attitudes and beliefs about gender and physical appearance
New Auto-Interp
Negative Logits
Ples
-0.95
cussion
-0.77
resumed
-0.76
glers
-0.75
Meteor
-0.71
clinton
-0.70
seism
-0.69
ransomware
-0.68
EStream
-0.68
laun
-0.68
POSITIVE LOGITS
superiority
1.28
masculinity
1.27
individuality
1.22
attractiveness
1.21
feminine
1.17
femin
1.17
uniqueness
1.15
masculine
1.13
worthiness
1.10
inferior
1.10
Activations Density 0.596%