INDEX
Explanations
occurrences of the word 'detected' and related forms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.23
1.4%
156
+0.20
1.2%
148
+0.18
1.1%
Correlated Neurons
Index
P. Corr.
Cos Sim.
308
+0.23
0.01
254
+0.20
0.01
294
+0.18
0.01
Negative Logits
ĭ
-3.63
ĥ½
-3.56
ı
-3.10
IJ
-2.95
ĨĴ
-2.95
į
-2.94
ĸ
-2.87
»¿
-2.86
½
-2.86
Ĺ
-2.83
POSITIVE LOGITS
gaps
1.67
utter
1.52
ently
1.48
him
1.40
bullet
1.35
rid
1.34
spot
1.34
intervals
1.34
ty
1.33
wing
1.32
Activations Density 0.007%