INDEX
Explanations
references to personal achievements or qualifications
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.26
0.9%
198
+0.08
0.3%
401
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1330
+0.26
0.07
848
+0.08
0.05
310
+0.07
0.06
Negative Logits
<bos>
-2.93
-0.90
-0.87
-0.86
成
-0.85
add
-0.85
.
-0.85
↵
-0.85
-0.85
</h1>
-0.84
POSITIVE LOGITS
affor
3.41
increa
3.34
maneu
3.26
impra
3.26
reluct
3.22
secon
2.99
inev
2.98
accla
2.95
Juf
2.94
wherea
2.94
Activations Density 0.919%