INDEX
Explanations
paragraphs related to historical events and global politics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
776
+0.21
0.7%
381
+0.17
0.5%
1177
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
776
+0.21
0.07
321
+0.17
0.06
573
+0.12
0.05
Negative Logits
jména
-0.57
<bos>
-0.49
unange
-0.48
wahre
-0.47
unglaub
-0.47
replaceable
-0.47
inigungs
-0.46
tudom
-0.46
;
-0.46
ucksack
-0.45
POSITIVE LOGITS
Cfr
1.06
maksi
1.02
Molto
0.96
Certo
0.95
antik
0.95
kristal
0.94
kompati
0.90
Simult
0.90
kön
0.89
akut
0.89
Activations Density 0.168%