INDEX
Explanations
quotes that express opinions or statements about individuals or groups
New Auto-Interp
Head Attr Weights
0:0.20
1:0.04
2:0.06
3:0.12
4:0.03
5:0.16
6:0.05
7:0.04
8:0.04
9:0.07
10:0.11
11:0.04
Negative Logits
byss
-1.30
魔
-1.21
qu
-1.18
ⓘ
-1.14
UCHIJ
-1.13
quished
-1.13
�
-1.11
:=
-1.09
【
-1.08
gal
-1.05
POSITIVE LOGITS
.")
2.28
』
2.13
),"
2.09
)."
2.02
]."
1.95
']
1.89
").
1.88
"]
1.86
"),
1.85
")
1.83
Activations Density 0.007%