INDEX
Explanations
statements related to decision-making or critical thinking
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1108
+0.29
1.1%
599
+0.23
0.9%
998
+0.15
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1108
+0.29
0.09
599
+0.23
0.07
1819
+0.15
0.08
Negative Logits
fta
-1.11
ftu
-1.00
NOO
-0.99
Ikr
-0.97
«<
-0.97
fuf
-0.96
fto
-0.95
fte
-0.95
purcha
-0.95
effe
-0.94
POSITIVE LOGITS
OMITBAD
0.57
CELLANEOUS
0.56
виправивши
0.52
cellaneous
0.50
becue
0.49
atiable
0.48
</strong>
0.48
AnchorStyles
0.47
.
0.45
prostu
0.44
Activations Density 1.019%