INDEX
Explanations
mention of the word "jail" or related terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1937
+0.13
0.4%
421
+0.12
0.4%
920
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1691
+0.13
0.02
920
+0.12
0.02
1937
+0.12
0.02
Negative Logits
underval
-0.49
Sif
-0.47
intersper
-0.47
Ked
-0.45
Gele
-0.44
verst
-0.44
Burke
-0.43
ANP
-0.43
endow
-0.43
vā
-0.42
POSITIVE LOGITS
jail
1.12
Jail
1.06
Jail
1.05
jail
0.96
jails
0.94
jailed
0.81
prison
0.66
Prison
0.63
inmates
0.63
pecuni
0.62
Activations Density 0.043%