INDEX
Explanations
mentions of LGBTQ+ community-related terms
content related to the LGBTQ community and its issues
New Auto-Interp
Negative Logits
amina
-0.68
respir
-0.65
acca
-0.64
igne
-0.63
Wolver
-0.63
pered
-0.62
urers
-0.62
Phys
-0.61
guiActiveUnfocused
-0.61
reper
-0.60
POSITIVE LOGITS
uably
0.84
Spectrum
0.79
spectrum
0.76
IQ
0.75
LGBTQ
0.75
LGBT
0.74
ecided
0.74
istani
0.72
eatures
0.71
Leaks
0.71
Activations Density 0.008%