INDEX
Explanations
expressions indicating advice or opinions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.13
0.4%
1042
+0.09
0.3%
624
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.13
0.06
1959
+0.09
0.06
1775
+0.09
0.04
Negative Logits
coö
-1.34
suspic
-1.31
emphat
-1.31
increa
-1.26
reluct
-1.26
wherea
-1.24
effe
-1.23
embra
-1.20
uninten
-1.20
disagre
-1.19
POSITIVE LOGITS
unless
0.80
unless
0.74
Cannot
0.73
nor
0.71
Therefore
0.71
cannot
0.69
nor
0.66
must
0.65
cannot
0.64
Therefore
0.63
Activations Density 0.471%