INDEX
Explanations
phrases related to statistical distribution and prediction
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
678
+0.13
0.4%
1671
+0.10
0.3%
1363
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.13
0.06
678
+0.10
0.06
270
+0.10
0.04
Negative Logits
.
-0.76
in
-0.75
and
-0.75
of
-0.72
to
-0.72
якості
-0.71
-0.71
śmierci
-0.71
другому
-0.70
,
-0.70
POSITIVE LOGITS
immen
1.91
abbra
1.90
dispen
1.89
overla
1.86
ridu
1.85
erec
1.81
pessi
1.79
igno
1.78
robus
1.77
incess
1.77
Activations Density 0.243%