INDEX
Explanations
words related to organizations, political movements, and campaigns
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
0.6%
1741
+0.16
0.5%
478
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
50
+0.17
0.05
204
+0.16
0.04
332
+0.11
0.05
Negative Logits
otheby
-0.94
ceptre
-0.82
vœux
-0.79
fantaisie
-0.76
USTAIN
-0.73
criptures
-0.71
noël
-0.70
eseorang
-0.67
bonté
-0.66
larmes
-0.66
POSITIVE LOGITS
ability
0.67
inability
0.64
own
0.63
presence
0.62
MouseAdapter
0.59
unwillingness
0.59
biggest
0.58
insistence
0.55
contribution
0.55
fortunes
0.54
Activations Density 0.236%