INDEX
Explanations
complex instructions or steps, potentially related to problem-solving or decision-making processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
82
+0.12
0.4%
889
+0.11
0.4%
1926
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1926
+0.12
0.04
82
+0.11
0.03
1194
+0.10
0.03
Negative Logits
desir
-1.19
?...
-1.14
ugg
-1.14
milf
-1.10
fuf
-1.10
wherea
-1.06
fta
-1.04
perfet
-1.04
.-"
-1.04
»>
-1.04
POSITIVE LOGITS
ones
0.72
ONE
0.71
PerformLayout
0.69
One
0.69
One
0.67
Ones
0.67
ONE
0.67
one
0.63
getOne
0.63
egiten
0.62
Activations Density 0.147%