INDEX
Explanations
mentions of specific locations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.09
0.3%
1741
+0.08
0.2%
1288
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2010
+0.09
0.06
588
+0.08
0.07
1589
+0.08
0.04
Negative Logits
sappi
-1.16
parteci
-0.89
succede
-0.83
apparti
-0.80
ridu
-0.80
vuol
-0.79
bbene
-0.79
migli
-0.79
inol
-0.77
altrett
-0.77
POSITIVE LOGITS
<bos>
0.95
there
0.90
we
0.75
,
0.71
they
0.69
you
0.68
alone
0.64
it
0.64
there
0.63
There
0.55
Activations Density 0.585%