INDEX
Explanations
sports-related terms, specifically in the context of football
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.13
0.4%
658
+0.10
0.3%
906
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
658
+0.13
0.05
599
+0.10
0.05
605
+0.10
0.01
Negative Logits
alberto
-1.73
roberto
-1.71
chery
-1.70
sergio
-1.68
claudia
-1.68
jorge
-1.67
gabri
-1.63
ricardo
-1.62
javier
-1.62
embodi
-1.62
POSITIVE LOGITS
<bos>
1.20
myself
0.68
my
0.66
I
0.65
feeling
0.65
hopefully
0.64
yeah
0.61
AndEndTag
0.61
gonna
0.61
me
0.61
Activations Density 0.292%