INDEX
Explanations
phrases related to government, constitutional rights, and political figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.13
0.4%
1385
+0.11
0.4%
1937
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
131
+0.13
0.05
1937
+0.11
0.06
421
+0.11
0.05
Negative Logits
impra
-0.91
maneu
-0.87
inappro
-0.83
erad
-0.76
disagre
-0.76
unve
-0.75
vété
-0.74
noël
-0.73
madonna
-0.73
affor
-0.72
POSITIVE LOGITS
our
0.98
Our
0.96
our
0.96
Our
0.94
ourselves
0.87
OUR
0.85
OUR
0.83
own
0.77
ours
0.70
我们的
0.67
Activations Density 0.140%