INDEX
Explanations
contexts related to academic publications and research, specifically mentioning page numbers
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
331
+0.16
0.7%
1325
+0.14
0.6%
1810
+0.14
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.16
0.04
568
+0.14
0.03
1691
+0.14
0.03
Negative Logits
<bos>
-1.36
OGND
-0.65
незавершена
-0.64
intios
-0.60
bezeichneter
-0.59
Abitanti
-0.59
Климат
-0.57
はじめに
-0.57
Nationale
-0.56
Zdroje
-0.56
POSITIVE LOGITS
pp
1.29
fortn
1.24
volunte
1.21
reluct
1.20
tolerably
1.20
sappi
1.18
impra
1.17
fep
1.16
maneu
1.16
fuf
1.13
Activations Density 0.328%