INDEX
Explanations
names of researchers and co-authors in academic publications
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.14
0.5%
227
+0.13
0.4%
283
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.14
0.06
227
+0.13
0.05
1097
+0.12
0.04
Negative Logits
<bos>
-0.79
himo
-0.76
hiszen
-0.68
ñadir
-0.68
wikidata
-0.67
millimeters
-0.66
<>());
-0.66
مرح
-0.65
]-->
-0.65
Personendaten
-0.64
POSITIVE LOGITS
attemp
1.70
maneu
1.69
impra
1.63
depic
1.59
encomp
1.58
inappro
1.55
reluct
1.52
strick
1.51
increa
1.50
inev
1.49
Activations Density 0.163%