INDEX
Explanations
phrases that begin with an apostrophe
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.38
1.6%
2019
+0.11
0.5%
2011
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1973
+0.38
0.06
305
+0.11
0.05
1672
+0.07
0.04
Negative Logits
<bos>
-1.48
ⓧ
-0.82
<?
-0.79
ੋ
-0.68
<eos>
-0.67
去
-0.67
do
-0.67
don
-0.66
comme
-0.66
ണ്ട
-0.66
POSITIVE LOGITS
maneu
2.66
increa
2.48
emphat
2.46
accla
2.46
affor
2.45
reluct
2.43
shenan
2.31
practition
2.29
disagre
2.28
inev
2.26
Activations Density 0.044%