INDEX
Explanations
instances of the token sequence starting with "[" and ending with "]"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
301
+0.14
0.8%
185
+0.11
0.6%
289
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
354
+0.14
0.07
51
+0.11
0.05
188
+0.10
0.06
Negative Logits
İ
-2.06
Ļ
-1.91
Ĵ
-1.80
Ł
-1.77
¦
-1.76
į
-1.70
ª
-1.69
Ħ
-1.68
ĩ
-1.66
Ķ
-1.62
POSITIVE LOGITS
)\]
1.69
ourse
1.59
ergic
1.59
Mice
1.58
ous
1.54
lin
1.49
.-
1.48
ocial
1.47
®
1.45
omyces
1.44
Activations Density 0.039%