INDEX
Explanations
content that discusses the division of objects or concepts into subcategories
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.09
0.3%
976
+0.07
0.3%
1778
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
327
+0.09
0.04
639
+0.07
0.03
1269
+0.07
0.03
Negative Logits
<bos>
-1.55
public
-0.75
ⓧ
-0.69
///**
-0.66
get
-0.65
do
-0.64
@
-0.63
,
-0.63
}{||-0.62
enumerate
-0.62
POSITIVE LOGITS
affor
1.79
stockholm
1.75
lele
1.72
umo
1.69
Juf
1.68
hcm
1.68
increa
1.60
bandung
1.59
milano
1.59
meis
1.56
Activations Density 0.172%