INDEX
Explanations
mentions of LGBTQ-related terms
references to the LGBTQ community and related topics
New Auto-Interp
Negative Logits
reper
-0.65
osaurs
-0.62
pered
-0.61
gio
-0.61
Rove
-0.60
Wolver
-0.60
scattering
-0.59
nings
-0.59
respir
-0.59
llular
-0.58
POSITIVE LOGITS
Leaks
0.85
uably
0.81
WER
0.81
naire
0.80
istani
0.78
yan
0.72
oman
0.71
ecided
0.71
erness
0.71
endered
0.70
Activations Density 0.018%