INDEX
Explanations
occurrences of specific characters or sequences within text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
90
+0.13
0.7%
261
+0.12
0.7%
427
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
90
+0.13
0.04
370
+0.12
0.04
133
+0.11
0.03
Negative Logits
s
-1.78
mers
-1.78
)](
-1.59
Inflater
-1.54
.’”
-1.53
).](
-1.48
distribution
-1.45
fax
-1.45
.[]{-1.44
'>
-1.44
POSITIVE LOGITS
ĨĴ
2.04
slog
1.79
±
1.78
ĻĤ
1.71
·
1.67
į
1.65
ĭ
1.61
Į
1.53
TRODUCTION
1.52
¿½
1.49
Activations Density 0.200%