INDEX
Explanations
mentions of LGBTQ-related themes
New Auto-Interp
Negative Logits
ëį°
-0.22
ร
-0.20
न
-0.19
ums
-0.18
ily
-0.17
majority
-0.17
ãģ¨ãģĵãĤį
-0.16
ष
-0.15
ãģĤãĤĬ
-0.15
ãģįãģŁ
-0.15
POSITIVE LOGITS
eenth
0.17
————————————————
0.17
ed
0.17
à¸Ļ
0.17
ãģĦãģ¾ãģĻ
0.16
chy
0.15
ëĭ¤ëĬĶ
0.15
edl
0.15
/cop
0.15
aroo
0.15
Activations Density 0.200%