INDEX
Explanations
sports team names and player assignments
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.16
0.5%
16
+0.13
0.4%
1741
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.16
0.05
72
+0.13
0.04
403
+0.11
0.03
Negative Logits
مط
-0.67
반
-0.66
子
-0.65
の
-0.64
が
-0.64
어
-0.64
ें
-0.64
을
-0.63
속
-0.63
下
-0.63
POSITIVE LOGITS
stockholm
1.88
accla
1.81
fta
1.75
embra
1.72
»>
1.72
desir
1.72
strick
1.71
ftu
1.71
snoopy
1.70
fuf
1.69
Activations Density 0.266%