INDEX
Explanations
conjunctions and phrases indicating additional information or connection
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
140
+0.13
0.7%
288
+0.12
0.7%
238
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
50
+0.13
0.07
494
+0.12
0.06
204
+0.12
0.08
Negative Logits
į
-1.86
ĸ
-1.77
ī
-1.76
¡
-1.66
İ
-1.63
ĸ´
-1.60
²
-1.55
ļ
-1.54
ĩ
-1.51
Ľ
-1.50
POSITIVE LOGITS
etc
2.11
even
1.89
naments
1.71
...)
1.70
other
1.69
otherwise
1.69
others
1.60
...)
1.55
alike
1.52
whatever
1.52
Activations Density 0.440%