INDEX
Explanations
specialized codes or references in programming contexts, particularly related to libraries or packages
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.21
1.2%
369
+0.20
1.2%
203
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
410
+0.21
0.02
111
+0.20
0.02
214
+0.12
0.01
Negative Logits
pecially
-1.70
atin
-1.65
ighting
-1.62
ights
-1.62
illance
-1.59
ighter
-1.56
ibility
-1.51
piring
-1.51
porter
-1.49
oder
-1.49
POSITIVE LOGITS
ĻĤ
1.73
nan
1.50
imm
1.46
meg
1.45
ops
1.35
trees
1.32
pointers
1.31
SHA
1.30
bears
1.30
bugs
1.30
Activations Density 0.020%