INDEX
Explanations
feedback or tips provided for improvement
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.25
0.9%
1535
+0.23
0.8%
382
+0.20
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.25
0.12
1535
+0.23
0.09
610
+0.20
0.07
Negative Logits
Lma
-1.16
sappi
-1.16
Ikr
-1.07
Lmfao
-1.06
Noice
-1.06
FTFY
-1.04
stihl
-1.03
uncin
-1.03
vogli
-1.01
gonz
-1.01
POSITIVE LOGITS
Therefore
0.80
↵↵
0.78
Whilst
0.77
Also
0.76
However
0.75
They
0.74
This
0.73
<eos>
0.73
Furthermore
0.72
Hence
0.71
Activations Density 0.489%