INDEX
Explanations
instances of the word "featuring."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.27
1.6%
156
+0.16
0.9%
23
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
443
+0.27
0.01
306
+0.16
0.01
46
+0.14
0.01
Negative Logits
ian
-1.70
orer
-1.56
si
-1.50
shire
-1.46
idopsis
-1.46
abel
-1.45
ierno
-1.44
doi
-1.42
lew
-1.42
ardi
-1.40
POSITIVE LOGITS
Ļ
2.39
Ĥ
2.09
ł
2.08
¢
2.08
¬
2.06
³
1.99
Ģ
1.99
Ķ
1.98
¤
1.98
ľĵ
1.93
Activations Density 0.282%