INDEX
Explanations
instances of the word "until."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.14
0.8%
292
+0.13
0.7%
193
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
292
+0.14
0.04
193
+0.13
0.04
404
+0.12
0.04
Negative Logits
helm
-1.75
keit
-1.74
¢
-1.53
th
-1.50
STEM
-1.43
shoes
-1.40
ister
-1.40
gly
-1.39
kin
-1.37
alike
-1.35
POSITIVE LOGITS
ahoma
1.66
ndef
1.48
iful
1.48
elimination
1.45
midnight
1.44
completion
1.41
iday
1.39
elif
1.39
date
1.39
else
1.38
Activations Density 0.194%