INDEX
Explanations
words related to numbers and mathematical concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.14
0.8%
203
+0.14
0.8%
111
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
38
+0.14
0.03
67
+0.14
0.03
7
+0.13
0.02
Negative Logits
àµį
-2.32
à±į
-2.28
à°¿
-2.06
rowser
-1.93
à±ģ
-1.83
à¯į
-1.76
ável
-1.71
ા
-1.71
able
-1.71
unsuccessful
-1.70
POSITIVE LOGITS
ĵ
4.85
IJ
4.80
·
4.77
ĻĤ
4.65
Ĭ
4.62
¤
4.60
©
4.55
į
4.52
¾
4.48
Ĵ
4.47
Activations Density 0.227%