INDEX
Explanations
that span multiple words related to career and personal development
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
74
+0.08
0.2%
58
+0.08
0.2%
369
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
74
+0.08
0.03
1718
+0.08
0.05
875
+0.07
0.03
Negative Logits
increa
-0.84
scrat
-0.84
reluct
-0.81
guarante
-0.80
effe
-0.78
unden
-0.78
YMMV
-0.77
perfet
-0.76
purcha
-0.76
encomp
-0.76
POSITIVE LOGITS
lost
0.98
lost
0.86
Lost
0.71
Lost
0.69
regained
0.68
restored
0.68
forgotten
0.67
LOST
0.65
stolen
0.65
damaged
0.64
Activations Density 0.342%