INDEX
Explanations
mentions of movies and actors
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
90
+0.11
0.4%
1392
+0.10
0.3%
1096
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
90
+0.11
0.03
1392
+0.10
0.03
990
+0.10
0.02
Negative Logits
⇨
-0.53
亘
-0.51
 ̄
-0.51
المعيارى
-0.50
toBe
-0.49
BoxShadow
-0.49
Abs
-0.49
SAX
-0.48
Disk
-0.47
]<=
-0.47
POSITIVE LOGITS
intersper
1.24
hairc
1.16
maneu
1.08
unspeak
1.04
increa
1.01
fuf
1.01
shenan
1.01
intermitt
1.00
boop
0.99
apprehen
0.98
Activations Density 0.053%