INDEX
Explanations
verbs related to management or control
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.20
0.6%
855
+0.09
0.3%
394
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
855
+0.20
0.05
40
+0.09
0.04
1685
+0.09
0.04
Negative Logits
apprehen
-0.57
unspeak
-0.51
perchance
-0.48
Daven
-0.48
Thos
-0.46
Preliminaries
-0.45
habituellement
-0.45
Katso
-0.45
Ibidem
-0.45
Hauts
-0.43
POSITIVE LOGITS
itself
1.01
Itself
0.89
itself
0.86
affez
0.84
soggior
0.74
abbra
0.72
insegna
0.68
aspetta
0.67
autunno
0.67
accogli
0.67
Activations Density 0.925%