INDEX
Explanations
technical or programming-related syntactical structures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
150
+0.12
0.7%
369
+0.11
0.6%
474
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
380
+0.12
0.07
150
+0.11
0.09
16
+0.11
0.06
Negative Logits
itself
-1.80
UES
-1.71
uncher
-1.64
later
-1.64
reasons
-1.54
myself
-1.52
gue
-1.49
ubicin
-1.48
"...
-1.46
europea
-1.45
POSITIVE LOGITS
agen
1.60
$('1.51
equal
1.45
imension
1.43
lich
1.42
emph
1.39
'>
1.36
$('.1.34
grass
1.33
ently
1.30
Activations Density 1.879%