INDEX
Explanations
the definite article "the."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
232
+0.13
0.7%
343
+0.12
0.6%
323
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
454
+0.13
0.00
441
+0.12
0.00
443
+0.12
0.00
Negative Logits
essions
-1.54
tern
-1.48
rese
-1.48
pragma
-1.45
statement
-1.43
rid
-1.43
waived
-1.38
zel
-1.37
STRA
-1.37
bib
-1.35
POSITIVE LOGITS
µ
3.33
»¿
2.98
£
2.95
·¸
2.91
ĵ
2.90
¿½
2.89
¾
2.88
Ķ
2.82
¸
2.79
±
2.79
Activations Density 0.000%