INDEX
Explanations
phrases emphasizing exclusivity or singularity
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
419
+0.17
1.0%
500
+0.15
0.8%
78
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
435
+0.17
0.05
500
+0.15
0.05
78
+0.13
0.04
Negative Logits
ļ
-1.83
ĸ
-1.73
angles
-1.64
Ļ
-1.59
generations
-1.57
directions
-1.54
¹
-1.49
steps
-1.48
ories
-1.48
rations
-1.46
POSITIVE LOGITS
forge
1.80
upon
1.61
jam
1.60
CTX
1.59
yon
1.58
xiv
1.57
safely
1.56
GRP
1.56
quote
1.56
MTP
1.55
Activations Density 0.312%