INDEX
Explanations
the last names of prominent figures or individuals mentioned in various articles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1323
+0.17
1.0%
25
+0.13
0.7%
896
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1097
+0.17
0.07
1363
+0.13
0.05
1343
+0.12
0.07
Negative Logits
<bos>
-2.35
ोंने
-0.67
CreateMap
-0.65
Normdatei
-0.65
ⓧ
-0.64
الاص
-0.64
#
-0.63
omaterial
-0.63
Ольга
-0.62
}{||-0.61
POSITIVE LOGITS
aen
1.80
unden
1.71
fta
1.69
compen
1.64
Keny
1.63
secon
1.63
wherea
1.62
Juf
1.58
»>
1.58
thut
1.56
Activations Density 0.696%