INDEX
Explanations
positive experiences and transformation through challenging situations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.14
0.4%
381
+0.10
0.3%
964
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1601
+0.14
0.05
1265
+0.10
0.04
1358
+0.10
0.05
Negative Logits
affor
-1.18
increa
-1.08
kani
-1.07
Simult
-1.04
embodi
-1.03
fta
-1.02
volunte
-1.00
PLW
-1.00
haup
-0.99
unlaw
-0.98
POSITIVE LOGITS
thought
0.72
thought
0.71
thinking
0.65
knew
0.64
<bos>
0.61
assumed
0.60
hadn
0.58
wasn
0.58
initially
0.58
myself
0.57
Activations Density 0.645%