INDEX
Explanations
references to the word "Que" and its variations, indicating a focus on LGBTQ+ themes
New Auto-Interp
Negative Logits
tha
-0.17
igid
-0.15
ont
-0.15
shop
-0.15
son
-0.14
phin
-0.14
aret
-0.14
rade
-0.14
sWith
-0.14
adies
-0.14
POSITIVE LOGITS
ixer
0.18
ijo
0.17
ued
0.17
ens
0.17
estion
0.16
iro
0.16
iros
0.16
Que
0.16
jas
0.15
366
0.15
Activations Density 0.008%