INDEX
Explanations
technical terms related to programming and data structures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
416
+0.10
0.6%
369
+0.10
0.6%
265
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
416
+0.10
0.05
153
+0.10
0.08
265
+0.10
0.04
Negative Logits
branches
-1.46
franchise
-1.41
penicillin
-1.37
endorse
-1.35
tel
-1.34
whatever
-1.31
theirs
-1.31
directory
-1.30
fuck
-1.30
rid
-1.25
POSITIVE LOGITS
<|outofrange|>
2.83
↵
2.83
↵↵↵
2.83
2.83
↵
2.83
↵↵
2.83
↵
2.83
↵
2.83
2.83
č↵
2.83
Activations Density 3.288%