INDEX
Explanations
references to the music genre rock
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.15
0.9%
410
+0.13
0.8%
99
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
443
+0.15
0.02
99
+0.13
0.02
45
+0.13
0.01
Negative Logits
hers
-1.84
weeks
-1.66
suppose
-1.62
comment
-1.59
certain
-1.54
TRODUCTION
-1.52
slightest
-1.52
fetal
-1.51
wait
-1.49
øre
-1.48
POSITIVE LOGITS
stars
2.33
chip
2.02
breaker
1.98
ford
1.90
stead
1.87
ings
1.86
cloth
1.84
oir
1.83
fulness
1.83
ups
1.82
Activations Density 0.119%