INDEX
Explanations
phrases related to organization and consolidation of information
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1385
+0.14
0.4%
1496
+0.07
0.2%
859
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1496
+0.14
0.03
1644
+0.07
0.01
273
+0.07
0.02
Negative Logits
alre
-1.03
increa
-0.98
intersper
-0.97
unspeak
-0.94
depic
-0.94
vainly
-0.94
guarante
-0.94
strick
-0.94
fortn
-0.93
unve
-0.93
POSITIVE LOGITS
unified
0.68
<bos>
0.63
single
0.62
cohesive
0.60
single
0.59
unified
0.56
complish
0.53
bó
0.53
umbrella
0.52
hesive
0.52
Activations Density 0.226%