INDEX
Explanations
mentions of sports teams, games, and athletic competitions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1501
+0.08
0.3%
690
+0.07
0.2%
50
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
690
+0.08
0.06
1501
+0.07
0.05
81
+0.07
0.04
Negative Logits
<bos>
-1.22
public
-0.79
find
-0.74
re
-0.72
text
-0.72
set
-0.72
protected
-0.72
clear
-0.71
.
-0.71
do
-0.70
POSITIVE LOGITS
maneu
1.98
ftu
1.96
thut
1.96
jaya
1.96
fta
1.94
aen
1.93
increa
1.92
affor
1.90
emphat
1.90
Juf
1.90
Activations Density 0.186%