INDEX
Explanations
names or references to individuals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.14
0.8%
485
+0.12
0.7%
171
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
255
+0.14
0.01
485
+0.12
0.01
34
+0.12
0.01
Negative Logits
Ń
-3.03
§
-2.71
Ĺ
-2.62
³
-2.55
«
-2.50
Ĭ
-2.43
·¸
-2.38
ĥ
-2.36
ľ
-2.29
¬
-2.26
POSITIVE LOGITS
nel
2.51
nal
1.89
aho
1.86
arman
1.76
ville
1.76
uscript
1.73
ault
1.73
agem
1.69
stru
1.68
nas
1.66
Activations Density 0.232%