INDEX
Explanations
references to physical actions or qualities related to objects
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
690
+0.08
0.2%
1385
+0.08
0.2%
2045
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.08
0.06
509
+0.08
0.05
1246
+0.07
0.04
Negative Logits
increa
-1.67
emphat
-1.64
effe
-1.58
encomp
-1.57
reluct
-1.56
alre
-1.55
fta
-1.53
suscep
-1.51
impra
-1.51
accla
-1.51
POSITIVE LOGITS
until
0.98
throughout
0.97
while
0.86
despite
0.84
during
0.84
until
0.82
till
0.80
keep
0.79
maintained
0.77
maintained
0.77
Activations Density 0.344%