INDEX
Explanations
references to forms of punishment, justice, and government actions in a political and historical context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
479
+0.12
0.4%
1548
+0.12
0.4%
1047
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1548
+0.12
0.02
1691
+0.12
0.02
1993
+0.10
0.03
Negative Logits
OGND
-0.67
utop
-0.54
ideolog
-0.53
rezept
-0.51
kopi
-0.50
ceb
-0.49
VIAF
-0.49
vort
-0.49
protes
-0.48
kandid
-0.47
POSITIVE LOGITS
punishment
1.03
punish
1.01
punishments
0.99
punished
0.96
Punishment
0.95
penalty
0.95
Penalty
0.95
punishing
0.90
Penalty
0.88
penalties
0.88
Activations Density 0.069%