INDEX
Explanations
instances of the word "this."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
93
+0.13
0.7%
99
+0.12
0.7%
86
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
402
+0.13
0.16
26
+0.12
0.14
199
+0.10
0.14
Negative Logits
¿½
-3.45
Ĥ¬
-3.27
ĥ½
-3.18
Ļª
-3.13
Į
-3.13
ķ
-3.07
ĨĴ
-3.05
Ĭ
-3.05
ĺ
-2.98
§
-2.97
POSITIVE LOGITS
\].
1.66
](
1.59
ilic
1.52
RSOS
1.51
imeters
1.50
([*
1.48
arrant
1.48
.](
1.40
Availability
1.38
ress
1.36
Activations Density 0.358%