INDEX
Explanations
repeated occurrences of the same element or pattern
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
325
+0.12
0.7%
459
+0.12
0.7%
171
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
10
+0.12
0.47
98
+0.12
0.21
228
+0.12
0.31
Negative Logits
·¸
-1.66
Ļ
-1.57
ķ
-1.55
actors
-1.52
wonders
-1.40
mention
-1.38
eyed
-1.38
plans
-1.37
predictions
-1.36
¢
-1.36
POSITIVE LOGITS
+^
1.62
↵ ↵
1.57
%.
1.56
eds
1.53
zyk
1.52
igraph
1.51
/~
1.47
ary
1.46
phab
1.45
ki
1.45
Activations Density 2.888%