INDEX
Explanations
quotations starting with "That" or "That's"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
161
+0.15
0.5%
32
+0.12
0.4%
1741
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
161
+0.15
0.09
1023
+0.12
0.07
32
+0.11
0.07
Negative Logits
inext
-1.02
outlander
-1.01
dispen
-1.01
compen
-1.01
ardu
-1.01
unce
-1.00
unil
-1.00
lein
-0.99
laun
-0.98
deleter
-0.98
POSITIVE LOGITS
That
0.80
That
0.79
that
0.72
that
0.67
THAT
0.67
shouldn
0.66
THAT
0.65
means
0.61
wouldn
0.60
could
0.59
Activations Density 0.170%