INDEX
Explanations
instances of the word "described" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
69
+0.12
0.7%
283
+0.11
0.6%
102
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
283
+0.12
0.03
223
+0.11
0.01
174
+0.10
0.02
Negative Logits
vein
-1.59
_
-1.45
itched
-1.40
aligned
-1.39
hereinafter
-1.37
ackage
-1.35
MOESM
-1.33
ably
-1.32
objection
-1.28
Bomb
-1.28
POSITIVE LOGITS
âζ
1.86
deg
1.57
’
1.50
degrees
1.50
enne
1.49
hood
1.49
renthood
1.48
\][
1.46
nd
1.39
iors
1.37
Activations Density 0.135%