INDEX
Explanations
the symbol '-' indicating lists or negative items
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.39
1.6%
381
+0.10
0.4%
382
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
426
+0.39
0.04
1630
+0.10
0.04
1468
+0.09
0.04
Negative Logits
<bos>
-2.27
ⓧ
-0.77
implement
-0.61
-0.60
get
-0.60
butterknife
-0.60
,
-0.59
<eos>
-0.59
in
-0.59
put
-0.59
POSITIVE LOGITS
wien
1.74
affor
1.73
lele
1.68
accla
1.66
volunte
1.64
coö
1.64
emphat
1.63
unlaw
1.63
maneu
1.63
increa
1.60
Activations Density 0.105%