INDEX
Explanations
references to identification or identity concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
148
+0.13
0.7%
253
+0.12
0.7%
108
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
7
+0.13
0.02
395
+0.12
0.02
475
+0.12
0.02
Negative Logits
neither
-1.65
.");
-1.59
.")
-1.54
Ĺ
-1.53
.").
-1.53
thee
-1.52
");
-1.52
Ĥ
-1.52
"))
-1.51
both
-1.49
POSITIVE LOGITS
iary
2.29
nier
1.69
era
1.66
ifiers
1.63
face
1.63
ulator
1.54
rams
1.51
fony
1.49
rays
1.49
ités
1.48
Activations Density 0.016%