INDEX
Explanations
phrases related to step-by-step processes or actions in a narrative
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.10
0.3%
297
+0.09
0.3%
1490
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1812
+0.10
0.03
1438
+0.09
0.03
448
+0.08
0.03
Negative Logits
reluct
-0.93
intersper
-0.92
apprehen
-0.90
increa
-0.89
inev
-0.88
unspeak
-0.85
strick
-0.84
attemp
-0.80
disagre
-0.80
exorbit
-0.78
POSITIVE LOGITS
except
0.70
<bos>
0.67
except
0.66
audiovisuel
0.64
Except
0.64
including
0.63
INCLUDING
0.59
censiti
0.58
entire
0.57
including
0.57
Activations Density 0.294%