INDEX
Explanations
instructions or steps in a process
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1909
+0.07
0.2%
382
+0.07
0.2%
559
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1162
+0.07
0.02
844
+0.07
0.02
539
+0.07
0.02
Negative Logits
rafra
-0.92
increa
-0.86
simplif
-0.83
Juf
-0.82
exé
-0.80
Perci
-0.77
effe
-0.77
tldr
-0.76
lidl
-0.74
mef
-0.74
POSITIVE LOGITS
antes
0.63
beforehand
0.62
before
0.55
prior
0.55
<bos>
0.53
先
0.52
voordat
0.52
before
0.51
notice
0.51
перед
0.50
Activations Density 0.178%