INDEX
Explanations
terms related to gender identity and gender equality
New Auto-Interp
Negative Logits
yan
-0.17
ãĥ«ãĥĪ
-0.16
yun
-0.15
eltas
-0.14
yas
-0.14
vals
-0.14
ivr
-0.14
lashes
-0.14
sie
-0.14
deo
-0.14
POSITIVE LOGITS
ed
0.37
roles
0.25
edn
0.25
que
0.23
less
0.21
fluid
0.21
-neutral
0.21
Roles
0.21
-role
0.21
edBy
0.20
Activations Density 0.011%