INDEX
Explanations
terms related to neuroscience and socio-political topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
678
+0.15
0.5%
304
+0.13
0.4%
1490
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
678
+0.15
0.05
1490
+0.13
0.05
1907
+0.13
0.03
Negative Logits
in
-0.77
at
-0.75
близь
-0.75
розта
-0.74
of
-0.74
to
-0.71
or
-0.71
is
-0.70
зберіга
-0.70
as
-0.70
POSITIVE LOGITS
sappi
1.77
dises
1.73
abnorm
1.72
affez
1.71
abbra
1.68
dispen
1.68
ritard
1.67
nece
1.65
pessi
1.64
effe
1.64
Activations Density 0.168%