INDEX
Explanations
phrases related to signing up for newsletters
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.15
0.5%
1937
+0.12
0.4%
1741
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1937
+0.15
0.05
131
+0.12
0.04
421
+0.10
0.04
Negative Logits
oiseau
-0.69
vermelhas
-0.63
voulons
-0.61
ferons
-0.60
noël
-0.59
vété
-0.59
impati
-0.59
poveznice
-0.57
prêtre
-0.56
ļ
-0.56
POSITIVE LOGITS
our
0.93
Our
0.93
Our
0.92
our
0.86
OUR
0.78
OUR
0.76
ourselves
0.76
own
0.72
Ours
0.67
我们的
0.65
Activations Density 0.130%