INDEX
Explanations
mentions of sports phrases and names, especially related to football teams and players
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.10
0.3%
344
+0.10
0.3%
599
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
922
+0.10
0.06
1748
+0.10
0.03
1501
+0.09
0.05
Negative Logits
pollut
-1.26
anhyd
-1.24
impractica
-1.05
embodi
-0.95
tetrach
-0.94
unwarran
-0.93
alberto
-0.92
unlaw
-0.91
sherds
-0.90
shewn
-0.90
POSITIVE LOGITS
<bos>
0.96
teammates
0.75
Transfermarkt
0.65
teammate
0.62
team
0.61
mmate
0.60
coaching
0.59
locker
0.59
coaches
0.58
coach
0.58
Activations Density 0.696%