INDEX
Explanations
phrases related to political programs and policies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.10
0.3%
1473
+0.09
0.3%
1823
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
208
+0.10
0.06
1705
+0.09
0.05
1823
+0.09
0.03
Negative Logits
indestru
-1.08
reluct
-1.03
swarovski
-0.94
hentai
-0.93
hairc
-0.92
snoopy
-0.89
milf
-0.88
disagre
-0.88
jouet
-0.87
shenan
-0.86
POSITIVE LOGITS
proposals
1.00
proposal
0.92
proposed
0.82
plans
0.81
plan
0.75
proposes
0.72
propose
0.70
Proposals
0.68
vision
0.68
ideas
0.68
Activations Density 0.607%