INDEX
Explanations
questions and beliefs expressed in writing
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
876
+0.11
0.3%
1871
+0.11
0.3%
381
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
862
+0.11
0.01
9
+0.11
0.01
1324
+0.10
0.02
Negative Logits
.
-0.79
and
-0.76
,
-0.76
↵↵
-0.73
with
-0.72
–
-0.72
[
-0.72
:
-0.72
;
-0.71
?
-0.71
POSITIVE LOGITS
nutr
1.93
lidl
1.80
stockholm
1.78
erec
1.77
wien
1.75
embra
1.74
dises
1.74
blos
1.74
exem
1.72
effe
1.70
Activations Density 0.094%