INDEX
Explanations
terminology related to evaluations and outcomes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.15
0.5%
1842
+0.15
0.4%
1013
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.15
0.10
1013
+0.15
0.10
942
+0.14
0.07
Negative Logits
increa
-2.10
affor
-1.89
encomp
-1.89
scrat
-1.86
unden
-1.83
desir
-1.83
impra
-1.82
guarante
-1.81
purcha
-1.81
inev
-1.78
POSITIVE LOGITS
.
0.78
because
0.74
despite
0.74
;
0.74
وأن
0.70
but
0.70
".
0.68
while
0.68
。
0.67
JTable
0.66
Activations Density 0.914%