INDEX
Explanations
statements related to assertions in test cases
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
148
+0.12
0.7%
391
+0.11
0.6%
248
+0.09
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
204
+0.12
0.02
101
+0.11
0.02
391
+0.09
0.02
Negative Logits
ĵ
-2.04
kinase
-1.80
zzle
-1.75
))**
-1.74
,}
-1.71
ife
-1.69
))**(
-1.63
chten
-1.60
))**(-
-1.59
$}
-1.58
POSITIVE LOGITS
opera
1.82
pitches
1.66
swings
1.58
orious
1.55
songs
1.53
apparatus
1.48
fires
1.46
fire
1.45
surroundings
1.43
orate
1.43
Activations Density 0.840%