INDEX
Explanations
expressions indicating permanence or frequency
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1363
+0.13
0.4%
1334
+0.12
0.4%
120
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1480
+0.13
0.04
1092
+0.12
0.04
1334
+0.11
0.04
Negative Logits
alberto
-0.68
bangkok
-0.67
Quod
-0.66
kani
-0.65
ikat
-0.64
andrea
-0.64
Cfr
-0.63
Epif
-0.63
guma
-0.63
baka
-0.63
POSITIVE LOGITS
ALWAYS
1.09
always
1.09
Always
1.04
always
1.03
Always
1.02
ALWAYS
1.01
deauna
0.90
<bos>
0.86
alway
0.83
siempre
0.83
Activations Density 0.185%