INDEX
Explanations
terms related to LGBTQ+ topics
New Auto-Interp
Negative Logits
ICC
-0.63
ERAL
-0.62
uania
-0.61
ERSON
-0.60
bats
-0.60
sonian
-0.60
iaries
-0.59
Hes
-0.58
lessly
-0.58
TODAY
-0.57
POSITIVE LOGITS
erness
1.25
zon
1.14
uing
1.09
ues
1.02
ued
1.00
edo
0.97
que
0.96
asy
0.94
ue
0.87
enne
0.86
Activations Density 0.026%