INDEX
Explanations
terms related to inclusion in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.14
0.8%
487
+0.12
0.7%
25
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
204
+0.14
0.01
25
+0.12
0.01
347
+0.10
0.01
Negative Logits
ĨĴ
-1.94
»¿
-1.92
Į
-1.88
ķ
-1.79
§
-1.79
Ŀ
-1.78
unnumbered
-1.73
Ļª
-1.70
ĵ
-1.66
¸
-1.65
POSITIVE LOGITS
imates
1.91
cript
1.74
aneous
1.65
appears
1.58
andidate
1.53
paired
1.52
limits
1.52
andidates
1.51
agraph
1.51
uer
1.49
Activations Density 0.014%