INDEX
Explanations
legal references and case details
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
265
+0.12
0.7%
145
+0.11
0.6%
447
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
319
+0.12
0.03
34
+0.11
0.04
145
+0.10
0.04
Negative Logits
respectively
-1.89
iona
-1.54
oche
-1.54
,...
-1.50
etc
-1.50
doesnt
-1.46
ctomy
-1.46
cdot
-1.45
?!
-1.45
â̦
-1.43
POSITIVE LOGITS
ĥ½
3.07
↵
2.87
2.87
2.87
<|outofrange|>
2.87
↵
2.87
2.87
↵
2.87
↵ ³³³
2.87
↵
2.87
Activations Density 0.252%