INDEX
Explanations
special characters or punctuation symbols in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.15
0.8%
386
+0.13
0.7%
165
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
165
+0.15
0.03
192
+0.13
0.01
386
+0.12
0.01
Negative Logits
ity
-1.69
itary
-1.62
carriers
-1.58
ellar
-1.51
inals
-1.50
ashi
-1.50
ames
-1.48
carrier
-1.47
ynam
-1.43
unes
-1.41
POSITIVE LOGITS
IJ
2.18
kwargs
2.03
Īĺ
2.00
ĥ
1.97
¢
1.96
³
1.81
Ĩ
1.81
Competing
1.77
½
1.74
Ķ
1.71
Activations Density 0.106%