INDEX
Explanations
references to fundraising
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.16
1.0%
463
+0.13
0.8%
191
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
463
+0.16
0.02
174
+0.13
0.02
191
+0.12
0.02
Negative Logits
widely
-1.60
áĢº
-1.56
UGH
-1.55
heavily
-1.53
happier
-1.51
blogger
-1.51
woke
-1.50
stark
-1.50
ingle
-1.49
sharply
-1.47
POSITIVE LOGITS
ģ
2.44
ī
2.43
Ħ
2.41
ľ
2.29
°
2.25
ŀ
2.25
notes
2.23
Ģ
2.11
ľĵ
2.09
raiser
2.07
Activations Density 3.190%