INDEX
Explanations
mentions of books and publications
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
752
+0.19
0.6%
16
+0.19
0.6%
1177
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.19
0.08
752
+0.19
0.06
764
+0.13
0.05
Negative Logits
semmi
-0.93
ertion
-0.90
idated
-0.87
riction
-0.82
rather
-0.80
inkább
-0.79
head
-0.79
Đây
-0.78
итоге
-0.78
들
-0.78
POSITIVE LOGITS
confé
2.23
Chapitre
2.09
hcm
2.02
Sén
2.02
délib
2.01
Strukt
1.97
rafra
1.96
Cfr
1.95
vété
1.93
déliv
1.93
Activations Density 0.339%