INDEX
Explanations
repeated articles, specifically the word "the."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
417
+0.12
0.7%
225
+0.12
0.6%
341
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
10
+0.12
0.47
98
+0.12
0.21
228
+0.11
0.32
Negative Logits
rael
-1.67
lut
-1.66
ubert
-1.55
iet
-1.53
ieu
-1.52
"?"
-1.51
oun
-1.51
rier
-1.49
ruction
-1.49
ouden
-1.48
POSITIVE LOGITS
precedent
1.58
GENERATED
1.51
MSS
1.40
Simplify
1.37
icity
1.37
older
1.36
PM
1.35
psin
1.32
fain
1.31
[(\[
1.31
Activations Density 2.906%