INDEX
Explanations
mentions of educational institutions and professional roles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
0.6%
227
+0.10
0.4%
814
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
905
+0.18
0.04
954
+0.10
0.04
782
+0.08
0.04
Negative Logits
<bos>
-1.45
ⓧ
-0.83
springfox
-0.72
naudoti
-0.71
įsi
-0.65
overcrow
-0.65
<?
-0.62
nė
-0.62
-0.61
thicken
-0.61
POSITIVE LOGITS
Mejía
0.86
Khart
0.84
Minang
0.82
Ribera
0.81
emmel
0.80
Bahía
0.80
Meksi
0.76
Cár
0.76
Mentre
0.76
véhic
0.76
Activations Density 0.404%