INDEX
Explanations
terms related to the LGBTQ+ community, specifically lesbian individuals
references to lesbian and LGBTQ+ themes
New Auto-Interp
Negative Logits
ULE
-0.82
æĸ¹
-0.81
EY
-0.80
frames
-0.74
rex
-0.72
Anim
-0.71
schild
-0.71
AH
-0.70
ROR
-0.70
lamm
-0.69
POSITIVE LOGITS
lesbian
0.99
couples
0.95
ism
0.90
Lesbian
0.87
sex
0.87
bisexual
0.87
emancipation
0.86
ization
0.86
separat
0.86
heterosexual
0.81
Activations Density 0.009%