INDEX
Explanations
references to similarity or equivalence in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.24
1.4%
376
+0.13
0.7%
30
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
342
+0.24
0.01
408
+0.13
0.01
392
+0.11
0.01
Negative Logits
Ĵ
-2.12
»
-1.99
Ŀ
-1.92
ĨĴ
-1.86
º
-1.84
ľĵ
-1.82
eva
-1.71
±
-1.67
Ľ
-1.65
son
-1.63
POSITIVE LOGITS
amounts
1.86
position
1.59
positions
1.52
between
1.49
ioned
1.47
acting
1.46
territories
1.45
sized
1.43
rency
1.43
itance
1.42
Activations Density 0.014%