INDEX
Explanations
terms related to duration and possibly limitation in time
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.15
0.8%
404
+0.12
0.7%
375
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
78
+0.15
0.01
404
+0.12
0.01
375
+0.12
0.01
Negative Logits
ene
-1.91
ledge
-1.61
ses
-1.60
ĥ½
-1.55
ward
-1.55
ifically
-1.53
elij
-1.52
ail
-1.52
faces
-1.48
ribe
-1.45
POSITIVE LOGITS
"}](#
1.98
]'
1.62
]",
1.55
)',
1.52
'+
1.49
ontal
1.45
Parenthood
1.44
)"
1.39
uto
1.36
?'
1.36
Activations Density 0.048%