INDEX
Explanations
references to various types of accidents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.19
1.1%
148
+0.11
0.6%
376
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
398
+0.19
0.02
297
+0.11
0.02
315
+0.11
0.02
Negative Logits
ĥ½
-1.52
¹
-1.43
OH
-1.37
dk
-1.31
Supreme
-1.30
answer
-1.29
↵
-1.28
↵↵
-1.28
-1.28
-1.28
POSITIVE LOGITS
scenes
1.76
theless
1.57
arial
1.53
imet
1.47
proof
1.46
areas
1.43
periods
1.42
igraph
1.42
table
1.42
ulated
1.41
Activations Density 0.017%