INDEX
Explanations
mathematical expressions, specifically involving primes and common denominators
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
168
+0.11
0.6%
115
+0.10
0.6%
232
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
168
+0.11
0.01
315
+0.10
0.01
192
+0.10
0.01
Negative Logits
vidia
-1.52
isk
-1.50
iret
-1.47
closely
-1.40
idea
-1.35
backgrounds
-1.34
shorts
-1.34
are
-1.33
ize
-1.32
ity
-1.30
POSITIVE LOGITS
ĥ½
4.28
Ĥ
3.69
↵ ↵
3.66
<|outofrange|>
3.66
↵↵↵
3.66
↵
3.66
↵
3.66
↵↵↵
3.66
<|outofrange|>
3.66
↵↵
3.66
Activations Density 0.033%