INDEX
Explanations
phrases describing physical location and actions related to concealing objects
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1385
+0.10
0.3%
946
+0.08
0.2%
198
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
946
+0.10
0.04
505
+0.08
0.02
164
+0.08
0.03
Negative Logits
succede
-0.68
uhr
-0.55
antici
-0.54
impon
-0.54
Minangkabau
-0.54
conclud
-0.53
rheumat
-0.53
ineffec
-0.53
quí
-0.52
horrend
-0.51
POSITIVE LOGITS
hidden
0.67
hiding
0.55
hidden
0.54
buried
0.53
Hidden
0.51
beneath
0.51
HIDDEN
0.50
löyty
0.50
storage
0.50
Hidden
0.49
Activations Density 0.303%