INDEX
Explanations
sentences that end with a comma followed by a numerical activation, potentially focusing on a specific type of syntactic structure
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
382
+0.15
0.5%
1741
+0.14
0.4%
2034
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.15
0.06
1534
+0.14
0.03
736
+0.12
0.05
Negative Logits
sappi
-1.20
milano
-1.13
deleter
-1.10
mef
-1.08
légiti
-1.05
embodi
-1.03
renou
-1.03
obb
-1.01
dises
-1.01
lancia
-1.00
POSITIVE LOGITS
it
0.88
they
0.87
he
0.79
she
0.76
there
0.75
we
0.72
you
0.63
they
0.63
I
0.60
everyone
0.60
Activations Density 0.266%