INDEX
Explanations
references to the name "Martin."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.14
0.8%
184
+0.13
0.8%
376
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
363
+0.14
0.01
469
+0.13
0.01
385
+0.11
0.01
Negative Logits
ĥ½
-2.13
'$
-1.91
¬
-1.83
?”
-1.72
¿½
-1.70
Ģ
-1.69
?_
-1.62
_________
-1.61
"?
-1.61
?"
-1.60
POSITIVE LOGITS
pora
2.06
cule
1.96
ique
1.94
cules
1.79
idone
1.76
dale
1.74
dorff
1.74
ico
1.73
boro
1.71
culo
1.69
Activations Density 0.016%