INDEX
Explanations
descriptions of unique characteristics, projects, contributions, or situations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1777
+0.14
0.5%
605
+0.13
0.4%
1557
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1777
+0.14
0.03
1557
+0.13
0.03
973
+0.11
0.03
Negative Logits
Bárbara
-0.51
extrait
-0.50
marchand
-0.50
Kjelder
-0.48
oreille
-0.48
romain
-0.47
Тарихы
-0.47
effectivement
-0.47
joyeux
-0.46
écla
-0.45
POSITIVE LOGITS
unique
1.03
unique
0.98
Unique
0.97
UNIQUE
0.95
Unique
0.94
UNIQUE
0.85
uniqueness
0.83
uniques
0.76
queness
0.76
uniquely
0.75
Activations Density 0.108%