INDEX
Explanations
phrases related to political discussions and societal issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1741
+0.36
1.3%
50
+0.21
0.8%
2019
+0.18
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
50
+0.36
0.07
16
+0.21
0.10
1959
+0.18
0.08
Negative Logits
vété
-1.03
Mère
-1.01
ritratto
-1.00
pitié
-0.98
noël
-0.97
gardien
-0.94
oeil
-0.92
dîner
-0.92
malheureux
-0.92
marchand
-0.92
POSITIVE LOGITS
astéro
0.77
fact
0.76
lack
0.69
presence
0.68
inability
0.68
question
0.68
same
0.66
vast
0.66
implication
0.65
idea
0.65
Activations Density 0.470%