INDEX
Explanations
sentences that convey a sense of finality or conclusion
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
476
+0.12
0.7%
328
+0.12
0.7%
292
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
111
+0.12
0.14
156
+0.12
0.06
271
+0.12
0.07
Negative Logits
·¸
-1.68
ĥ½
-1.66
³
-1.60
¨
-1.52
unto
-1.50
onin
-1.49
ozo
-1.48
µ
-1.48
§
-1.47
{})-1.42
POSITIVE LOGITS
adays
1.59
itary
1.55
mind
1.52
try
1.45
1.38
afternoon
1.34
nap
1.33
chant
1.33
contrary
1.33
maybe
1.32
Activations Density 0.055%