INDEX
Explanations
mentions of specific names or titles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1034
+0.13
0.5%
1133
+0.12
0.5%
1350
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.13
0.06
1406
+0.12
0.04
1056
+0.12
0.05
Negative Logits
ethene
-0.71
ethane
-0.64
toluene
-0.56
aniline
-0.54
acetate
-0.52
perus
-0.52
earnestness
-0.51
ltä
-0.51
lamino
-0.50
Iné
-0.50
POSITIVE LOGITS
RY
0.97
DY
0.97
dovr
0.95
hy
0.92
hy
0.90
Hy
0.90
Hy
0.90
LY
0.89
Dy
0.89
HY
0.88
Activations Density 0.279%