INDEX
Explanations
text related to policy changes or modifications in guidelines
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.12
0.5%
1036
+0.05
0.2%
1508
+0.05
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1036
+0.12
0.04
1866
+0.05
0.04
2042
+0.05
0.04
Negative Logits
<bos>
-1.88
ⓧ
-1.16
-1.02
/**
-1.01
/*
-0.98
<?
-0.95
/*++
-0.85
#![
-0.81
<?
-0.80
#
-0.78
POSITIVE LOGITS
affor
1.61
wien
1.60
jaya
1.57
bandung
1.57
increa
1.53
hcm
1.51
maneu
1.51
maroc
1.50
lele
1.49
aen
1.48
Activations Density 0.107%