INDEX
Explanations
phrases containing advice on managing emotions and relationships
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.11
0.3%
1510
+0.08
0.2%
1097
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.11
0.03
646
+0.08
0.04
1209
+0.07
0.04
Negative Logits
fup
-1.10
fte
-1.09
ftu
-1.05
fta
-1.05
fep
-1.03
fei
-1.02
secon
-1.02
hcm
-1.01
frankfurt
-1.00
squa
-0.98
POSITIVE LOGITS
success
0.66
better
0.63
happier
0.62
succeed
0.61
easier
0.58
success
0.57
successful
0.55
successes
0.55
successfully
0.53
jws
0.53
Activations Density 0.386%