INDEX
Explanations
unintended thoughts, memories, and moments that keep resurfacing in the mind
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
964
+0.12
0.4%
1013
+0.10
0.3%
674
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.12
0.03
194
+0.10
0.03
1267
+0.10
0.05
Negative Logits
swarovski
-1.05
hairc
-1.03
softshell
-0.91
ecru
-0.90
Xoxo
-0.90
unlaw
-0.89
tupperware
-0.83
Confu
-0.83
embodi
-0.82
velour
-0.81
POSITIVE LOGITS
<bos>
0.84
myself
0.70
reminded
0.68
spontan
0.67
reger
0.63
besta
0.62
realis
0.62
handels
0.61
intrig
0.59
wondering
0.58
Activations Density 0.827%