INDEX
Explanations
phrases related to legislation, law enforcement, and socio-political issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
554
+0.14
0.5%
1350
+0.11
0.4%
549
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
554
+0.14
0.06
549
+0.11
0.04
325
+0.10
0.04
Negative Logits
apprehen
-0.91
reluct
-0.81
disagre
-0.80
YMMV
-0.79
iirc
-0.77
unspeak
-0.76
cuck
-0.76
horrend
-0.75
encomp
-0.75
affor
-0.75
POSITIVE LOGITS
keep
1.09
keep
1.06
Keep
1.02
Keep
0.98
KEEP
0.96
KEEP
0.96
kept
0.95
kept
0.92
keeps
0.89
keeping
0.88
Activations Density 0.078%