INDEX
Explanations
mentions of stacks or arrangements of objects
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1581
+0.14
0.7%
442
+0.14
0.6%
783
+0.14
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
537
+0.14
0.03
442
+0.14
0.03
783
+0.14
0.03
Negative Logits
<bos>
-1.75
Francis
-0.55
Ethan
-0.49
private
-0.49
conduct
-0.48
James
-0.48
Шо
-0.48
ξ
-0.48
Francis
-0.48
such
-0.48
POSITIVE LOGITS
stack
1.34
stacks
1.30
Pile
1.30
Stack
1.28
pile
1.27
Stacks
1.23
stack
1.22
STACK
1.18
stacking
1.18
Stack
1.17
Activations Density 0.308%