INDEX
Explanations
words indicating a transition or additional information within a narrative
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
36
+0.13
0.7%
458
+0.13
0.7%
106
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
384
+0.13
0.01
36
+0.13
0.01
250
+0.12
0.01
Negative Logits
Sites
-1.76
Site
-1.52
cookies
-1.49
Primer
-1.38
Your
-1.34
Cookies
-1.34
aste
-1.33
Solution
-1.32
primers
-1.32
accountable
-1.32
POSITIVE LOGITS
¿½
1.81
Ŀ
1.79
Ł
1.64
Ļª
1.61
turned
1.55
·
1.52
dated
1.48
eting
1.48
aly
1.47
olding
1.47
Activations Density 0.012%