INDEX
Explanations
phrases related to sports events and decisions that garner significant public attention
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.14
0.4%
108
+0.13
0.4%
344
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
575
+0.14
0.04
112
+0.13
0.02
1501
+0.10
0.03
Negative Logits
recrystal
-0.59
nictwa
-0.58
rodzic
-0.56
STRUCTOR
-0.53
owników
-0.52
skiej
-0.52
maxWidth
-0.51
dziewczyn
-0.50
ską
-0.50
aduras
-0.50
POSITIVE LOGITS
emphat
1.25
pamph
1.21
McLaugh
1.19
Abbé
1.16
depic
1.15
reluct
1.13
Keny
1.12
accla
1.10
Shakspeare
1.08
inconce
1.07
Activations Density 0.565%