INDEX
Explanations
phrases emphasizing elements of quantity or existence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
411
+0.13
0.7%
430
+0.12
0.6%
268
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
308
+0.13
0.07
89
+0.12
0.07
402
+0.11
0.07
Negative Logits
packed
-1.87
amazon
-1.67
opia
-1.63
strain
-1.56
KS
-1.55
Street
-1.54
tensor
-1.52
gas
-1.49
tor
-1.46
Fried
-1.45
POSITIVE LOGITS
¦
2.23
į
1.86
¯
1.85
µ
1.81
Ĺ
1.80
º
1.72
°
1.66
·
1.65
ĨĴ
1.64
unless
1.61
Activations Density 0.131%