INDEX
Explanations
positions or titles within an organization
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.09
0.4%
394
+0.07
0.3%
50
+0.06
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1575
+0.09
0.09
1671
+0.07
0.08
1703
+0.06
0.06
Negative Logits
<bos>
-1.09
public
-0.77
ⓧ
-0.75
|}
-0.75
/**
-0.74
<?
-0.74
//
-0.74
@
-0.72
-0.72
if
-0.70
POSITIVE LOGITS
maneu
1.83
affor
1.70
increa
1.66
accla
1.66
erad
1.64
impra
1.63
Khart
1.60
excru
1.59
resear
1.58
stockholm
1.58
Activations Density 2.371%