INDEX
Explanations
numerical values, particularly highlighting significant statistics or data points
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
458
+0.16
0.9%
362
+0.13
0.7%
81
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
199
+0.16
0.04
12
+0.13
0.03
458
+0.13
0.03
Negative Logits
ĨĴ
-1.75
»¿
-1.70
¿½
-1.64
uppose
-1.61
pity
-1.58
Ĥ¬
-1.54
ĸ´
-1.51
rodents
-1.45
rept
-1.45
ats
-1.44
POSITIVE LOGITS
states
1.50
dressing
1.44
senses
1.41
(,
1.41
ermost
1.34
dock
1.34
enson
1.34
inski
1.33
sky
1.33
mann
1.33
Activations Density 1.474%