INDEX
Explanations
phrases related to controversial or sensitive topics, such as LGBTQ+ rights, political issues, and social justice
references to individuals and their personal connections or identities
New Auto-Interp
Head Attr Weights
0:0.12
1:0.03
2:0.12
3:0.14
4:0.06
5:0.09
6:0.04
7:0.04
8:0.06
9:0.11
10:0.08
11:0.05
Negative Logits
��
-1.24
etheus
-1.17
skelet
-1.15
Published
-1.05
Spiegel
-1.04
isphere
-1.04
PDATE
-1.03
GBT
-1.02
FontSize
-1.00
PsyNetMessage
-1.00
POSITIVE LOGITS
flies
1.21
illin
1.18
Chicken
1.18
oglu
1.18
Farms
1.11
Drops
1.09
ynski
1.08
uan
1.07
ille
1.06
Slime
1.06
Activations Density 0.036%