INDEX
Explanations
phrases related to recommendations and efforts to refine strategies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.17
0.6%
50
+0.15
0.5%
1415
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.17
0.08
1334
+0.15
0.07
866
+0.15
0.04
Negative Logits
lele
-1.52
fta
-1.52
territo
-1.52
Meksi
-1.49
cannes
-1.45
kamb
-1.44
maksi
-1.44
silikon
-1.43
Lég
-1.41
matel
-1.38
POSITIVE LOGITS
see
0.77
find
0.76
avoid
0.75
introduce
0.74
observe
0.74
determine
0.74
understand
0.72
learn
0.72
obtain
0.72
assume
0.72
Activations Density 0.295%