INDEX
Explanations
numerical values and mathematical expressions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.16
0.9%
504
+0.13
0.7%
435
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
504
+0.16
0.02
443
+0.13
0.02
10
+0.12
0.03
Negative Logits
¯
-3.09
¬
-3.05
£
-3.00
¿½
-2.95
¦
-2.94
ł
-2.94
Ń
-2.87
±
-2.85
¸
-2.76
½
-2.74
POSITIVE LOGITS
okay
1.66
alright
1.64
else
1.61
hip
1.58
//!
1.55
addEventListener
1.53
hline
1.49
+}
1.48
woke
1.45
right
1.45
Activations Density 0.133%