INDEX
Explanations
phrases related to political figures and historical events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.27
0.8%
1967
+0.14
0.4%
872
+0.12
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1842
+0.27
0.06
227
+0.14
0.07
69
+0.12
0.01
Negative Logits
lapto
-0.74
kask
-0.71
moza
-0.68
vrt
-0.64
poveznice
-0.63
zima
-0.61
palet
-0.61
overcrow
-0.61
traktor
-0.61
tyn
-0.61
POSITIVE LOGITS
Darío
0.78
Áng
0.76
trône
0.72
Valentín
0.71
Lucía
0.71
fameux
0.70
désert
0.69
Apare
0.69
bénéfice
0.68
Cárdenas
0.68
Activations Density 0.635%