INDEX
Explanations
names of specific individuals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1177
+0.18
0.6%
1842
+0.13
0.4%
227
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1177
+0.18
0.05
1314
+0.13
0.08
981
+0.11
0.09
Negative Logits
for
-0.90
and
-0.89
at
-0.89
in
-0.87
on
-0.86
or
-0.84
↵↵
-0.84
to
-0.82
but
-0.82
as
-0.82
POSITIVE LOGITS
alkoh
2.15
Traité
2.06
Sén
2.04
Strukt
2.02
embra
2.01
simplif
1.98
mef
1.93
Lég
1.93
dises
1.92
Cfr
1.92
Activations Density 0.517%