INDEX
Explanations
terms related to women or femininity
references to women and feminine concepts
New Auto-Interp
Negative Logits
ritch
-0.65
tolerance
-0.61
rooms
-0.60
rays
-0.58
tub
-0.58
OUT
-0.57
oats
-0.56
ray
-0.56
rete
-0.56
ting
-0.55
POSITIVE LOGITS
iak
0.98
ko
0.87
ī
0.84
Īè
0.81
ovic
0.78
omen
0.78
alia
0.78
eday
0.77
ksh
0.77
stru
0.77
Activations Density 0.039%