INDEX
Explanations
nested list structures and their corresponding parameters
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
409
+0.11
0.6%
283
+0.11
0.6%
449
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
213
+0.11
0.01
129
+0.11
0.00
322
+0.11
0.01
Negative Logits
Ĥ
-2.91
¿½
-2.70
©
-2.29
Ĥ¬
-2.17
¶
-2.13
¸
-2.13
®
-2.13
Ĩ
-2.11
ı
-2.06
Ħ
-1.96
POSITIVE LOGITS
hers
1.39
ales
1.35
eLife
1.33
pos
1.33
inbox
1.31
possessions
1.29
late
1.28
quier
1.27
responsive
1.25
than
1.24
Activations Density 0.071%