INDEX
Explanations
mentions of different individuals within a particular context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
453
+0.17
0.5%
964
+0.12
0.4%
50
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.17
0.06
1001
+0.12
0.05
724
+0.12
0.04
Negative Logits
<bos>
-1.09
magnify
-0.56
thoughtless
-0.55
or
-0.55
and
-0.53
тол
-0.51
to
-0.51
bestow
-0.51
endeavouring
-0.51
unwarran
-0.51
POSITIVE LOGITS
alkoh
1.45
kosme
1.27
kompati
1.27
antik
1.27
silikon
1.27
kram
1.25
optik
1.23
akut
1.22
praktik
1.21
logis
1.20
Activations Density 0.106%