INDEX
Explanations
phrases related to controversy or conflict
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1677
+0.13
0.5%
555
+0.12
0.4%
101
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1677
+0.13
0.03
474
+0.12
0.02
101
+0.11
0.02
Negative Logits
biograf
-0.76
utop
-0.74
hek
-0.69
ideolog
-0.67
boks
-0.66
makro
-0.66
kandid
-0.66
republi
-0.65
katastro
-0.65
vola
-0.64
POSITIVE LOGITS
spin
1.29
spin
1.19
spinning
1.15
spun
1.13
Spin
1.11
spins
1.11
Spin
1.05
spinners
1.02
spinner
0.99
spinning
0.94
Activations Density 0.124%