INDEX
Explanations
phrases related to nuclear weapons, international relations, and political figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
382
+0.16
0.5%
2019
+0.14
0.4%
1150
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.16
0.08
1445
+0.14
0.07
1265
+0.14
0.05
Negative Logits
pymysql
-0.86
rempla
-0.86
amitié
-0.85
légiti
-0.85
illustre
-0.85
vété
-0.84
exé
-0.84
prétend
-0.83
sappi
-0.83
recueil
-0.82
POSITIVE LOGITS
there
0.72
it
0.65
we
0.63
etc
0.60
there
0.59
they
0.59
this
0.56
you
0.56
respectively
0.53
these
0.52
Activations Density 0.393%