INDEX
Explanations
instances of the phrase "is" or related expressions indicating identity or existence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
55
+0.13
0.7%
423
+0.12
0.7%
31
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
17
+0.13
0.20
337
+0.12
0.21
233
+0.11
0.17
Negative Logits
pes
-1.59
ature
-1.52
true
-1.51
ees
-1.47
pected
-1.38
auss
-1.37
$%
-1.37
unc
-1.36
well
-1.36
lessly
-1.36
POSITIVE LOGITS
¿
2.40
©
2.08
³
2.06
²
1.98
Ł
1.89
»
1.88
¦
1.85
¾
1.85
¢
1.79
Ĩ
1.74
Activations Density 5.368%