INDEX
Explanations
texts related to criminal justice policy and reform
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.12
0.4%
1842
+0.08
0.2%
604
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
198
+0.12
0.05
1553
+0.08
0.06
100
+0.07
0.04
Negative Logits
intersper
-1.07
reluct
-1.06
shenan
-1.05
encomp
-1.05
maneu
-1.01
disagre
-1.00
unlaw
-0.96
fign
-0.94
unden
-0.94
depic
-0.94
POSITIVE LOGITS
criminal
0.67
arrests
0.64
arrest
0.63
tences
0.61
arrested
0.59
convicted
0.57
felony
0.56
incarceration
0.55
hoga
0.55
prison
0.54
Activations Density 0.608%