INDEX
Explanations
names or variables that are abbreviated using the format initial<tab>capitalized
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1177
+0.12
0.4%
964
+0.12
0.4%
453
+0.12
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.12
0.06
1001
+0.12
0.05
199
+0.12
0.03
Negative Logits
<bos>
-0.76
.
-0.60
!
-0.59
and
-0.59
a
-0.58
or
-0.58
of
-0.58
(
-0.58
s
-0.57
to
-0.57
POSITIVE LOGITS
alkoh
1.75
kram
1.67
makro
1.61
silikon
1.58
keramik
1.56
uhr
1.55
antik
1.53
maksi
1.53
kac
1.53
kompati
1.52
Activations Density 0.124%