INDEX
Explanations
author names and journal publication information
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.18
0.5%
1699
+0.14
0.4%
1741
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.18
0.04
323
+0.14
0.03
1343
+0.13
0.03
Negative Logits
adalajara
-0.78
">...
-0.69
Quiénes
-0.69
soggior
-0.68
interessanti
-0.67
ayaquil
-0.67
succede
-0.66
scattata
-0.66
purtroppo
-0.66
Economía
-0.65
POSITIVE LOGITS
Deviant
0.94
&.
0.89
Illus
0.84
Addon
0.84
noyau
0.83
[?]
0.79
^^^
0.78
mécanisme
0.77
maintien
0.76
Hd
0.74
Activations Density 0.052%