INDEX
Explanations
references to specific sports teams and players
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1177
+0.20
0.7%
1741
+0.16
0.5%
856
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1177
+0.20
0.05
283
+0.16
0.01
1510
+0.15
0.06
Negative Logits
<bos>
-1.62
in
-0.93
no
-0.93
to
-0.92
at
-0.89
for
-0.89
so
-0.88
.
-0.88
as
-0.86
on
-0.85
POSITIVE LOGITS
hcm
2.49
mef
2.28
lele
2.26
alkoh
2.24
immen
2.22
kram
2.22
franz
2.21
antik
2.20
cannes
2.19
„,
2.19
Activations Density 0.610%