INDEX
Explanations
phrases related to specific dates or events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1127
+0.14
0.6%
878
+0.14
0.6%
1065
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
689
+0.14
0.03
981
+0.14
0.04
678
+0.14
0.03
Negative Logits
own
-0.52
about
-0.51
of
-0.51
lavet
-0.49
أبريل
-0.49
Heritage
-0.49
الله
-0.48
什么
-0.48
like
-0.48
GARET
-0.47
POSITIVE LOGITS
matel
1.20
squa
1.12
paillettes
1.12
immen
1.12
quoique
1.10
parteci
1.08
chande
1.08
casio
1.06
incess
1.06
autob
1.05
Activations Density 0.163%