INDEX
Explanations
definitions or meanings of terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.14
0.8%
165
+0.12
0.7%
144
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
287
+0.14
0.04
484
+0.12
0.03
210
+0.12
0.04
Negative Logits
§
-1.83
¦
-1.75
»¿
-1.72
¡
-1.72
±
-1.59
ught
-1.58
»
-1.58
²
-1.57
¢
-1.55
¬
-1.54
POSITIVE LOGITS
whereby
1.71
booth
1.56
ession
1.52
operative
1.49
heets
1.46
walks
1.46
pace
1.43
device
1.43
forge
1.42
IZE
1.42
Activations Density 0.046%