INDEX
Explanations
locations or positions within a sequence that denote a specific point
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.16
0.5%
314
+0.10
0.3%
765
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
314
+0.16
0.02
765
+0.10
0.03
1639
+0.10
0.02
Negative Logits
unspeak
-0.55
℠
-0.53
hairc
-0.49
bapt
-0.48
apprehen
-0.48
vuol
-0.47
succede
-0.47
Dangers
-0.47
purtroppo
-0.46
Dims
-0.46
POSITIVE LOGITS
Middle
0.69
Middle
0.69
middle
0.67
MIDDLE
0.65
middle
0.63
Mid
0.60
mid
0.59
MIDDLE
0.58
<bos>
0.58
raggiunto
0.56
Activations Density 0.091%