INDEX
Explanations
instances of conjunctions, particularly "and."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.38
1.3%
2019
+0.11
0.4%
1265
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1265
+0.38
0.05
1806
+0.11
0.04
1415
+0.10
0.05
Negative Logits
intersper
-1.05
felicity
-0.90
encomp
-0.87
liberality
-0.87
Thos
-0.86
quitted
-0.82
Pamph
-0.82
gaily
-0.81
Shakspeare
-0.79
Augu
-0.76
POSITIVE LOGITS
GYPT
0.74
Muito
0.73
Talvez
0.72
Adicion
0.72
sizePolicy
0.71
Estou
0.69
Hitam
0.68
visející
0.67
confronti
0.67
Saludos
0.65
Activations Density 0.277%