INDEX
Explanations
references to specific locations or contexts related to discoveries
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.09
3:0.08
4:0.14
5:0.02
6:0.03
7:0.27
8:0.04
9:0.03
10:0.12
11:0.08
Negative Logits
verages
-1.82
adject
-1.61
ctions
-1.58
isSpecialOrderable
-1.44
pport
-1.44
enance
-1.43
idity
-1.41
ctive
-1.40
patience
-1.36
opin
-1.35
POSITIVE LOGITS
Mystery
1.51
sparking
1.48
Mysteries
1.45
Pick
1.45
ource
1.41
Aerial
1.24
Steal
1.24
uncover
1.24
illegally
1.22
unknown
1.22
Activations Density 0.006%