INDEX
Explanations
mentions of the city "Detroit."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1068
+0.19
1.1%
687
+0.18
1.0%
313
+0.16
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.19
0.04
1677
+0.18
0.03
478
+0.16
0.02
Negative Logits
<bos>
-0.87
intersper
-0.82
Lorsqu
-0.76
encomp
-0.72
reconno
-0.69
apprehen
-0.66
gild
-0.66
Jusqu
-0.65
Pense
-0.65
unve
-0.64
POSITIVE LOGITS
Detroit
1.02
Detroit
0.97
DETROIT
0.90
detroit
0.90
Det
0.87
ROIT
0.75
PLWABN
0.75
Det
0.72
det
0.71
DET
0.67
Activations Density 0.619%