INDEX
Explanations
proper nouns and locations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
297
+0.15
0.6%
381
+0.15
0.5%
1562
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1127
+0.15
0.04
1562
+0.15
0.05
188
+0.12
0.04
Negative Logits
<bos>
-0.89
மற்ற
-0.69
xxiii
-0.66
xxii
-0.66
xxiv
-0.63
xxvi
-0.60
소녀
-0.58
xxi
-0.58
xxv
-0.57
đồ
-0.54
POSITIVE LOGITS
bamb
1.04
fré
1.02
dirond
1.00
quoique
1.00
cyr
0.99
nutr
0.98
parati
0.94
quarelle
0.94
exé
0.94
apparti
0.93
Activations Density 0.167%