INDEX
Explanations
references to gender and sexual identity
New Auto-Interp
Negative Logits
(ib
-0.16
shoulder
-0.15
aru
-0.15
tooth
-0.15
коÑĢиÑģÑĤ
-0.15
çīĻ
-0.15
vit
-0.14
eting
-0.14
jaw
-0.14
Verb
-0.14
POSITIVE LOGITS
fores
0.27
pub
0.27
vag
0.27
ure
0.26
vul
0.24
pen
0.24
penis
0.23
erect
0.21
hym
0.21
Pen
0.21
Activations Density 0.037%