INDEX
Explanations
terms related to gender identity and associated concepts
New Auto-Interp
Negative Logits
kou
-0.17
Pist
-0.16
âķij
-0.15
rell
-0.14
ahan
-0.14
Kou
-0.14
ê½
-0.14
atak
-0.14
rog
-0.14
ltk
-0.14
POSITIVE LOGITS
ÏĦÏİ
0.17
avras
0.15
r
0.15
elseif
0.15
arez
0.14
allas
0.14
studio
0.14
Tud
0.14
pro
0.14
Layers
0.14
Activations Density 0.016%