INDEX
Explanations
references to societal pressures and expectations surrounding gender roles in professional settings
New Auto-Interp
Negative Logits
ãĥ¼ãĥĩ
-0.17
eskort
-0.16
letic
-0.15
ksam
-0.15
å¢
-0.14
NST
-0.14
loff
-0.14
ãĤ¤ãĤº
-0.14
volatile
-0.13
onis
-0.13
POSITIVE LOGITS
une
0.15
bg
0.15
éĺ
0.14
blah
0.14
alleged
0.14
threat
0.14
icks
0.14
926
0.14
бÑĥдÑĤо
0.14
somehow
0.14
Activations Density 0.221%