INDEX
Explanations
criticism and controversies surrounding political figures, particularly discussions related to alliances, statements, and actions of political leaders
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
604
+0.14
0.4%
24
+0.09
0.3%
198
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
24
+0.14
0.06
915
+0.09
0.06
239
+0.08
0.06
Negative Logits
Ename
-0.81
<bos>
-0.75
unspeak
-0.74
enlight
-0.70
apprehen
-0.70
unwarran
-0.68
Washable
-0.68
Permeability
-0.67
encomp
-0.65
tolerably
-0.65
POSITIVE LOGITS
ideolog
0.67
himself
0.64
himself
0.59
republi
0.58
Obrador
0.58
religione
0.57
Republi
0.56
misst
0.56
akus
0.54
biograf
0.54
Activations Density 0.696%