INDEX
Explanations
words related to political tension and high stakes situations, potentially in the context of political leaders and elections
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
599
+0.10
0.3%
1798
+0.08
0.2%
266
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
599
+0.10
0.05
636
+0.08
0.04
1997
+0.08
0.03
Negative Logits
reluct
-1.05
ftu
-1.01
apprehen
-0.96
increa
-0.96
disagre
-0.95
desir
-0.95
fta
-0.94
depic
-0.94
nece
-0.93
unlaw
-0.93
POSITIVE LOGITS
stakes
0.85
stakes
0.68
importance
0.67
Stakes
0.67
IsContent
0.64
stake
0.63
implications
0.63
importance
0.59
decisions
0.59
consequences
0.57
Activations Density 0.518%