INDEX
Explanations
Text related to programming instructions and code including function names, variable names, and system-specific terminology
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
876
+0.15
0.5%
1699
+0.14
0.5%
453
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1317
+0.15
0.03
876
+0.14
0.00
1871
+0.12
0.02
Negative Logits
如下图
-0.78
Nuevas
-0.68
Conoce
-0.68
Podob
-0.68
oficina
-0.67
Misión
-0.67
tarjeta
-0.66
外部链接
-0.66
在这种
-0.66
كيفية
-0.66
POSITIVE LOGITS
!...
1.77
lele
1.68
illi
1.67
sii
1.60
blos
1.56
?...
1.55
dora
1.51
bloss
1.49
!«
1.48
tremb
1.47
Activations Density 0.075%