INDEX
Explanations
phrases related to sports, particularly football and baseball
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
382
+0.10
0.3%
974
+0.09
0.2%
674
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
974
+0.10
0.04
2032
+0.09
0.04
534
+0.08
0.04
Negative Logits
kram
-0.85
traktor
-0.81
lemp
-0.80
teras
-0.79
labd
-0.79
augus
-0.78
kac
-0.77
tomat
-0.76
lele
-0.76
alkoh
-0.76
POSITIVE LOGITS
reluct
0.78
indeed
0.75
disreg
0.75
philanth
0.73
fortunately
0.72
shenan
0.69
thankfully
0.69
indeed
0.68
unve
0.67
luckily
0.66
Activations Density 0.458%