INDEX
Explanations
instances of the word "innovative."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.41
2.6%
376
+0.17
1.1%
246
+0.13
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
156
+0.41
0.01
246
+0.17
0.01
51
+0.13
0.01
Negative Logits
enson
-1.60
shelf
-1.54
oto
-1.51
auer
-1.48
ene
-1.42
arden
-1.39
heimer
-1.37
.“
-1.33
Prize
-1.32
atche
-1.30
POSITIVE LOGITS
ł
2.51
¢
2.47
Īĺ
2.21
ķ
2.20
¯
2.12
ļ
2.07
Ŀ
2.04
ŀ
2.02
ĸ
2.02
Ħ
1.98
Activations Density 0.019%