INDEX
Explanations
the past tense form of verbs or references to past actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.69
10.9%
111
+0.67
10.7%
71
+0.04
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
4
+0.69
0.67
473
+0.67
0.66
49
+0.04
0.67
Negative Logits
ĥ½
-2.34
®
-2.32
ħ
-2.27
¾
-2.21
ı
-2.21
ľĵ
-2.18
Ľ
-2.18
<|outofrange|>
-2.18
<|outofrange|>
-2.18
↵
-2.18
POSITIVE LOGITS
rox
0.85
acha
0.85
ensive
0.84
iang
0.84
roscopic
0.84
anks
0.82
rosis
0.82
um
0.81
snaps
0.81
orph
0.80
Activations Density 1.434%