INDEX
Explanations
questions about ethics and morality, especially related to societal and political circumstances
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.17
0.5%
197
+0.11
0.3%
674
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
197
+0.17
0.05
118
+0.11
0.05
610
+0.10
0.04
Negative Logits
uncin
-0.76
solidar
-0.70
notor
-0.68
platt
-0.65
Wię
-0.65
ideolog
-0.65
dises
-0.64
tomat
-0.63
robus
-0.63
Warto
-0.62
POSITIVE LOGITS
$?
0.76
déploy
0.73
écout
0.71
prêtres
0.70
Juifs
0.65
chrétien
0.64
dédi
0.64
?
0.64
lumineuse
0.63
yoksa
0.63
Activations Density 0.165%