INDEX
Explanations
programming-related terms and code snippets
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
876
+0.24
0.8%
1177
+0.21
0.7%
678
+0.16
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
678
+0.24
0.07
876
+0.21
-0.00
1871
+0.16
0.05
Negative Logits
He
-0.96
His
-0.90
<eos>
-0.88
But
-0.88
Crítica
-0.87
Rusia
-0.86
Además
-0.85
وَ
-0.85
May
-0.84
More
-0.84
POSITIVE LOGITS
increa
3.30
effe
3.25
!...
3.18
?...
3.17
fta
3.13
suscep
3.06
ftu
3.05
inev
3.04
desir
3.04
thut
3.04
Activations Density 0.438%