INDEX
Explanations
instructions or steps in a process
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1334
+0.15
0.5%
752
+0.13
0.4%
161
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1334
+0.15
0.09
161
+0.13
0.06
395
+0.12
0.07
Negative Logits
ECTION
-0.69
ecuted
-0.67
intenant
-0.66
FBref
-0.65
ValueStyle
-0.64
getRuntime
-0.63
Shetterly
-0.62
OURCE
-0.62
Obrador
-0.61
ECONDS
-0.61
POSITIVE LOGITS
reluct
1.43
impra
1.41
disagre
1.25
unspeak
1.22
indestru
1.19
shenan
1.13
inconce
1.12
disreg
1.11
impractica
1.11
apprehen
1.10
Activations Density 0.346%