INDEX
Explanations
occurrences of the definite article "the."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
471
+0.12
0.6%
30
+0.11
0.6%
272
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
10
+0.12
0.48
98
+0.11
0.21
228
+0.11
0.32
Negative Logits
urious
-1.51
uel
-1.48
racket
-1.48
ULAR
-1.47
same
-1.44
bia
-1.43
spacing
-1.37
friendly
-1.37
eeee
-1.37
ea
-1.35
POSITIVE LOGITS
«
2.63
·
2.14
¨
2.03
Ļª
2.02
ŀ
2.00
©
1.99
¬
1.99
ļ
1.90
²
1.90
ĻĤ
1.86
Activations Density 2.924%