INDEX
Explanations
parenthetical expressions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
435
+0.14
0.8%
335
+0.11
0.6%
333
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
188
+0.14
0.24
486
+0.11
0.16
154
+0.11
0.15
Negative Logits
¥
-2.02
§
-1.96
¦
-1.88
IJ
-1.88
¼
-1.81
į
-1.76
¤
-1.74
Ń
-1.72
ļ
-1.71
ĵ
-1.65
POSITIVE LOGITS
ionine
1.71
$^{-1.51
tons
1.51
á̝
1.44
ouston
1.42
persists
1.40
adian
1.40
survives
1.39
unreadable
1.39
epigen
1.38
Activations Density 0.349%