INDEX
Explanations
the word "Made" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.17
1.0%
443
+0.12
0.7%
192
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
192
+0.17
0.01
1
+0.12
0.01
103
+0.12
0.01
Negative Logits
sing
-1.86
verted
-1.59
maternal
-1.49
gered
-1.43
mit
-1.42
asing
-1.42
ased
-1.41
feet
-1.40
reflection
-1.40
aught
-1.40
POSITIVE LOGITS
leine
2.20
holm
1.74
iera
1.74
ira
1.71
àµį
1.71
áŁ
1.69
heets
1.69
etable
1.66
aggio
1.66
heet
1.64
Activations Density 0.066%