INDEX
Explanations
terms related to gender and sexual identity
New Auto-Interp
Negative Logits
ÏĢον
-0.09
reo
-0.08
ious
-0.07
apter
-0.07
evet
-0.07
rias
-0.07
Gratis
-0.07
istica
-0.07
VOKE
-0.07
plies
-0.07
POSITIVE LOGITS
uchen
0.06
or
0.06
ones
0.06
aha
0.06
orge
0.06
Johns
0.05
orr
0.05
102
0.05
BA
0.05
equip
0.05
Activations Density 0.002%