INDEX
Explanations
the presence of specific names or proper nouns
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.06
3:0.08
4:0.09
5:0.07
6:0.09
7:0.08
8:0.07
9:0.09
10:0.06
11:0.08
Negative Logits
lett
-2.38
Afric
-2.29
Letters
-2.19
OPLE
-2.06
"],"
-2.05
Roses
-2.04
ILCS
-2.01
ann
-2.00
Sources
-1.97
Cabinet
-1.97
POSITIVE LOGITS
tracking
2.54
ptions
2.24
perfect
2.17
erest
2.07
progress
2.07
clair
2.06
tame
2.00
vant
2.00
zech
2.00
keley
2.00
Activations Density 0.000%