INDEX
Explanations
references to capital punishment
New Auto-Interp
Head Attr Weights
0:0.07
1:0.12
2:0.04
3:0.04
4:0.04
5:0.27
6:0.04
7:0.03
8:0.04
9:0.05
10:0.14
11:0.05
Negative Logits
CN
-2.06
downstream
-1.90
located
-1.90
locations
-1.71
ItemImage
-1.67
addons
-1.62
�
-1.58
overhe
-1.58
originate
-1.57
Marketplace
-1.56
POSITIVE LOGITS
rul
2.15
querque
2.02
chwitz
1.98
ardless
1.96
punishment
1.90
arnaev
1.88
stigma
1.86
ga
1.76
sentencing
1.72
ety
1.72
Activations Density 0.001%