INDEX
Explanations
terms related to political power dynamics and capabilities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.27
1.0%
1839
+0.09
0.3%
124
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.27
0.04
124
+0.09
0.04
1839
+0.08
0.04
Negative Logits
<bos>
-2.14
-0.73
enshr
-0.72
inaugurate
-0.69
/**
-0.68
harmonize
-0.68
ⓧ
-0.64
abolish
-0.64
reunite
-0.63
<?
-0.62
POSITIVE LOGITS
paradiso
1.27
soggior
1.10
bandung
1.08
riva
1.03
megane
1.02
venuto
0.96
toscana
0.95
!!</
0.95
lele
0.95
eiffel
0.95
Activations Density 0.292%