INDEX
Explanations
mentions of projects, creative endeavors, and community involvement
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1438
+0.15
0.5%
1535
+0.14
0.5%
1150
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1438
+0.15
0.07
1013
+0.14
0.10
507
+0.13
0.06
Negative Logits
affor
-2.21
increa
-2.20
guarante
-2.17
desir
-2.12
fuf
-2.07
ftu
-2.07
fta
-2.07
purcha
-2.04
perfon
-2.04
reluct
-2.04
POSITIVE LOGITS
.
0.94
<bos>
0.89
.”
0.86
。
0.84
."
0.79
!
0.79
ModelAdmin
0.79
}.
0.78
].
0.78
.}
0.78
Activations Density 1.480%