INDEX
Explanations
sources and credits mentioned in articles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1577
+0.10
0.3%
1403
+0.09
0.3%
1793
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1856
+0.10
0.05
1041
+0.09
0.04
1003
+0.08
0.02
Negative Logits
hiszen
-0.62
utivos
-0.59
више
-0.54
توانند
-0.54
Italijanski
-0.54
archiviato
-0.54
DisplayMetrics
-0.54
along
-0.53
بعد
-0.53
برای
-0.53
POSITIVE LOGITS
increa
1.62
encomp
1.59
affor
1.55
guarante
1.54
scrat
1.53
maneu
1.51
impra
1.51
accla
1.48
suscep
1.47
unden
1.46
Activations Density 0.169%