INDEX
Explanations
concepts that emphasize utility and practicality
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
274
+0.14
0.8%
376
+0.13
0.7%
377
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
274
+0.14
0.02
149
+0.13
0.01
267
+0.11
0.01
Negative Logits
ĨĴ
-2.99
Ļª
-2.87
¯
-2.37
»¿
-2.27
Ĵ
-2.17
·¸
-2.01
ķ
-2.01
<|outofrange|>
-1.94
↵
-1.94
↵
-1.94
POSITIVE LOGITS
erals
1.73
flaw
1.68
ities
1.67
itat
1.63
idade
1.62
ibus
1.61
wise
1.56
idad
1.56
imetry
1.55
ISPR
1.51
Activations Density 0.010%