INDEX
Explanations
names of mythological figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1271
+0.15
0.5%
395
+0.12
0.4%
1187
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1120
+0.15
0.08
1343
+0.12
0.07
690
+0.11
0.07
Negative Logits
<bos>
-0.84
hjæl
-0.67
kræ
-0.66
Пу
-0.63
świę
-0.61
い
-0.61
つ
-0.60
arbejde
-0.60
bibnamefont
-0.59
꺼
-0.59
POSITIVE LOGITS
stockholm
2.09
madonna
2.07
?...
2.04
emphat
2.02
maneu
2.00
lidl
1.96
wien
1.96
reluct
1.96
increa
1.96
strick
1.95
Activations Density 0.401%