INDEX
Explanations
nouns related to essential human concepts and entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.24
1.3%
111
+0.17
1.0%
77
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
11
+0.24
0.01
97
+0.17
0.01
393
+0.12
0.01
Negative Logits
ppat
-1.80
\[*
-1.69
âĢķ
-1.67
**[
-1.60
pgen
-1.49
dom
-1.46
‘
-1.43
([**
-1.40
↵
-1.39
pntd
-1.35
POSITIVE LOGITS
«
3.10
ĻĤ
2.98
ł
2.97
IJ
2.91
Ĩ
2.81
»¿
2.80
Ī
2.79
Īĺ
2.77
¦
2.77
ĸ
2.73
Activations Density 0.051%