INDEX
Explanations
instances of the word "when."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.15
0.8%
478
+0.13
0.7%
174
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
174
+0.15
0.07
154
+0.13
0.08
112
+0.11
0.08
Negative Logits
scrape
-1.56
Rhodes
-1.54
ray
-1.50
ipes
-1.50
usc
-1.49
elij
-1.49
remember
-1.46
membrane
-1.43
iston
-1.40
Scotia
-1.40
POSITIVE LOGITS
ĥ½
3.31
ı
3.15
Ĩ
3.03
Ĥ
2.83
¤
2.69
¹
2.67
Ļ
2.65
·
2.63
Ģ
2.50
ī
2.49
Activations Density 0.126%