INDEX
Explanations
references to the color pink or associated terms in various contexts
New Auto-Interp
Head Attr Weights
0:0.05
1:0.03
2:0.25
3:0.10
4:0.15
5:0.05
6:0.02
7:0.02
8:0.06
9:0.14
10:0.05
11:0.02
Negative Logits
��
-1.81
BIL
-1.48
ngth
-1.43
gregation
-1.37
者
-1.36
izational
-1.33
=-=-=-=-
-1.32
utenberg
-1.31
ertation
-1.29
conclud
-1.28
POSITIVE LOGITS
stro
1.40
cheeks
1.21
paper
1.19
ilings
1.18
ioxide
1.16
olive
1.15
reme
1.13
rish
1.12
cigarettes
1.10
pires
1.10
Activations Density 0.004%