INDEX
Explanations
questions and uncertainties
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1735
+0.07
0.2%
917
+0.07
0.2%
1510
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
137
+0.07
0.04
907
+0.07
0.03
1735
+0.07
0.03
Negative Logits
emphat
-1.21
increa
-1.21
intermitt
-1.09
impra
-1.04
alre
-1.04
depic
-1.04
maneu
-1.03
reluct
-1.03
strick
-1.03
uninten
-1.02
POSITIVE LOGITS
OGND
0.76
how
0.69
how
0.69
AnchorStyles
0.67
wavering
0.65
verifyException
0.63
MLLoader
0.63
@[+][
0.62
thinkable
0.61
Espèce
0.61
Activations Density 0.286%