INDEX
Explanations
URLs or hyperlinks in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.14
0.8%
315
+0.13
0.7%
197
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
315
+0.14
0.04
462
+0.13
0.03
458
+0.11
0.04
Negative Logits
manship
-1.80
fected
-1.55
zzles
-1.54
transmitted
-1.50
function
-1.49
well
-1.49
borne
-1.46
malfunction
-1.46
?”
-1.44
hereto
-1.43
POSITIVE LOGITS
ĥ½
3.17
½
2.95
Ŀ
2.95
¿
2.91
¾
2.88
¤
2.83
İ
2.80
¦
2.79
ĨĴ
2.77
¥
2.76
Activations Density 0.137%