INDEX
Explanations
mentions of corruption and military activities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.13
0.4%
604
+0.11
0.3%
1284
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
198
+0.13
0.06
392
+0.11
0.03
566
+0.09
0.06
Negative Logits
disagre
-1.34
impra
-1.31
reluct
-1.25
shenan
-1.23
uninten
-1.22
apprehen
-1.15
excru
-1.15
increa
-1.13
cuck
-1.11
unspeak
-1.10
POSITIVE LOGITS
corruption
0.86
corruption
0.81
corrupt
0.73
Corruption
0.72
Corruption
0.69
politicians
0.60
getItemId
0.58
corrupción
0.58
corrom
0.56
officials
0.55
Activations Density 0.955%