INDEX
Explanations
phrases related to abuse of power and corruption
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.09
0.3%
1316
+0.07
0.2%
1890
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1948
+0.09
0.05
1293
+0.07
0.05
801
+0.07
0.04
Negative Logits
apprehen
-1.12
reluct
-1.11
unspeak
-1.10
disagre
-1.09
depic
-1.05
exasper
-1.04
intersper
-1.01
shenan
-0.99
ineffec
-0.98
pooh
-0.95
POSITIVE LOGITS
themselves
0.64
personal
0.62
selfish
0.62
himself
0.61
herself
0.60
profit
0.60
pecuni
0.57
personally
0.56
Zelanda
0.54
oneself
0.54
Activations Density 0.454%