INDEX
Explanations
references to LGBTQ+ identities and events
New Auto-Interp
Negative Logits
rade
-0.15
Cres
-0.14
chip
-0.14
opposite
-0.14
Fighting
-0.14
ãĥªãĥ¼ãĤº
-0.13
zc
-0.13
366
-0.13
Nir
-0.13
elm
-0.13
POSITIVE LOGITS
eph
0.16
owel
0.16
iously
0.15
ted
0.15
monic
0.15
ears
0.15
kowski
0.14
buz
0.14
moment
0.14
æ¯Ľ
0.14
Activations Density 0.004%