INDEX
Explanations
terms related to reviewing, reassessing, or repeating actions or processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1896
+0.13
0.5%
58
+0.13
0.4%
872
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
58
+0.13
0.04
404
+0.13
0.04
1896
+0.12
0.03
Negative Logits
<bos>
-0.73
to
-0.65
for
-0.64
-0.64
of
-0.63
has
-0.62
as
-0.61
损伤
-0.61
on
-0.61
at
-0.60
POSITIVE LOGITS
dises
1.70
erec
1.62
vito
1.61
canel
1.60
nece
1.58
pican
1.58
embra
1.57
parma
1.55
shur
1.54
aton
1.54
Activations Density 0.073%