INDEX
Explanations
themes of societal expectations and gender roles
New Auto-Interp
Negative Logits
depreci
-0.15
brick
-0.14
rawer
-0.14
éĿĪ
-0.14
unner
-0.14
nerg
-0.13
оÑĢаз
-0.13
ðŁĶ
-0.13
nackt
-0.13
ادا
-0.13
POSITIVE LOGITS
sweetness
0.35
innocent
0.31
sweet
0.31
gentle
0.30
sugar
0.30
sug
0.29
gent
0.29
soft
0.29
nic
0.27
nice
0.27
Activations Density 0.490%