INDEX
Explanations
data related to population statistics and demographic information
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
63
+0.12
0.7%
313
+0.11
0.6%
204
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
385
+0.12
0.01
288
+0.11
0.02
497
+0.11
0.01
Negative Logits
lectric
-1.50
mathscr
-1.45
while
-1.45
"}](#
-1.44
router
-1.43
act
-1.43
suits
-1.38
cleanup
-1.37
ellees
-1.37
'))
-1.31
POSITIVE LOGITS
age
1.58
Caption
1.56
Lik
1.53
popularity
1.48
ratings
1.48
—.
1.44
veteran
1.41
adian
1.41
thood
1.39
accession
1.33
Activations Density 0.164%