INDEX
Explanations
occurrences of the word "closing" or related terms indicating finality
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.17
1.0%
503
+0.12
0.7%
239
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
258
+0.17
0.02
239
+0.12
0.01
448
+0.11
0.01
Negative Logits
Ļ
-2.94
↵
-2.88
<|outofrange|>
-2.88
↵
-2.88
-2.88
↵
-2.88
↵
-2.88
↵
-2.88
↵
-2.88
↵
-2.88
POSITIVE LOGITS
issues
1.96
cule
1.92
dilemma
1.77
gaps
1.74
mechanism
1.69
issue
1.67
idone
1.65
malf
1.65
defects
1.64
instructions
1.63
Activations Density 3.566%