INDEX
Explanations
instances of the word "sign."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
59
+0.13
0.7%
172
+0.11
0.6%
376
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
126
+0.13
0.02
243
+0.11
0.01
59
+0.10
0.01
Negative Logits
undry
-1.64
")]
-1.59
thee
-1.54
'))
-1.51
laid
-1.50
'));
-1.50
'>
-1.49
rian
-1.43
"));
-1.43
AndroidRuntime
-1.40
POSITIVE LOGITS
ĩ
2.00
ored
1.89
ificantly
1.87
orable
1.86
ĻĤ
1.72
bugs
1.63
submissions
1.56
stones
1.46
orest
1.46
rams
1.45
Activations Density 0.903%