INDEX
Explanations
words related to names and identities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.14
0.8%
431
+0.12
0.7%
479
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
279
+0.14
0.07
431
+0.12
0.05
59
+0.12
0.05
Negative Logits
cknowled
-1.85
systematic
-1.60
explanatory
-1.60
ncia
-1.59
jeopardy
-1.51
recess
-1.44
hearsay
-1.40
Lisbon
-1.37
residual
-1.36
vdots
-1.35
POSITIVE LOGITS
§
2.69
Īĺ
2.38
Ĵ
2.33
£
2.32
¬
2.31
ŀ
2.31
®
2.25
Ł
2.22
ļ
2.16
ĭ
2.14
Activations Density 0.383%