INDEX
Explanations
proper nouns related to individuals or organizations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.15
0.5%
227
+0.14
0.4%
1097
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.15
0.11
1097
+0.14
0.08
227
+0.13
0.08
Negative Logits
-0.75
d
-0.72
...
-0.72
her
-0.72
no
-0.72
a
-0.71
so
-0.70
la
-0.70
to
-0.70
de
-0.69
POSITIVE LOGITS
milano
2.29
cannes
2.16
bandung
2.01
tanga
1.99
napoli
1.96
marte
1.96
lele
1.96
sergio
1.95
jorge
1.94
casio
1.94
Activations Density 0.352%