INDEX
Explanations
instructions or steps in a process
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
876
+0.10
0.3%
1531
+0.09
0.2%
690
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2044
+0.10
0.06
1415
+0.09
0.02
646
+0.08
0.03
Negative Logits
Khart
-1.13
Keny
-1.06
Juf
-1.03
Abbé
-1.03
maneu
-1.00
emphat
-1.00
Minang
-0.96
Hæ
-0.96
volunte
-0.96
depic
-0.94
POSITIVE LOGITS
easy
0.75
effortless
0.73
easier
0.72
simply
0.72
simplicity
0.71
simpler
0.70
bersicht
0.69
easy
0.69
effortlessly
0.69
easily
0.68
Activations Density 0.408%