INDEX
Explanations
phrases related to future aspirations and achievements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
764
+0.22
0.7%
1842
+0.21
0.7%
674
+0.16
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
764
+0.22
0.06
875
+0.21
0.03
1683
+0.16
0.05
Negative Logits
effe
-1.62
guarante
-1.58
fte
-1.52
maneu
-1.51
increa
-1.51
thut
-1.49
inev
-1.48
fta
-1.48
ftu
-1.48
volunte
-1.46
POSITIVE LOGITS
someday
0.73
AndEndTag
0.72
Winaray
0.66
TokenNameLBRACE
0.63
rrggbb
0.63
Vikipedi
0.61
hopefully
0.60
GraphicsUnit
0.59
eventually
0.57
nakalista
0.57
Activations Density 0.610%