INDEX
Explanations
references to political figures and activities, particularly those relating to controversies and power dynamics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
168
+0.16
0.7%
1942
+0.14
0.6%
251
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
168
+0.16
0.03
1174
+0.14
0.03
1942
+0.11
0.03
Negative Logits
awtextra
-0.52
flox
-0.51
BIBSYS
-0.50
дописавши
-0.50
ToScroll
-0.49
Iné
-0.47
mender
-0.46
utnik
-0.46
Württemberg
-0.46
principalColumn
-0.45
POSITIVE LOGITS
ph
1.28
Ph
1.28
Ph
1.16
ph
1.09
PH
1.02
PHILL
0.95
PHO
0.95
PHILLIPS
0.94
phi
0.92
PH
0.91
Activations Density 0.147%