INDEX
Explanations
phrases related to routines and personal development
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.34
1.1%
1700
+0.07
0.2%
1967
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1700
+0.34
0.03
357
+0.07
0.02
1638
+0.07
0.05
Negative Logits
<bos>
-0.99
indescri
-0.91
unve
-0.90
depic
-0.89
apprehen
-0.87
unspeak
-0.86
gaily
-0.85
encomp
-0.85
intersper
-0.85
snoopy
-0.84
POSITIVE LOGITS
raso
0.97
furg
0.96
kafe
0.92
moza
0.91
kosme
0.90
utop
0.87
kokos
0.86
karton
0.86
palio
0.86
churras
0.85
Activations Density 0.491%