INDEX
Explanations
sports-related content, particularly regarding American football teams and games
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1506
+0.15
0.6%
920
+0.14
0.6%
553
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1506
+0.15
0.03
553
+0.14
0.02
920
+0.14
0.02
Negative Logits
ADATA
-0.56
NKC
-0.56
LUMP
-0.51
'*':
-0.50
firmas
-0.49
TextHelper
-0.49
三国
-0.49
enchymal
-0.48
estancias
-0.48
cocinas
-0.48
POSITIVE LOGITS
sappi
1.00
scopri
1.00
Ehh
0.97
vorrei
0.96
suscep
0.96
voleva
0.93
purtroppo
0.92
affor
0.92
shenan
0.89
reluct
0.89
Activations Density 0.068%