INDEX
Explanations
words related to prisons and incarceration
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
920
+0.14
0.5%
866
+0.14
0.5%
1691
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1691
+0.14
0.03
866
+0.14
0.03
920
+0.12
0.03
Negative Logits
melk
-0.51
tages
-0.49
klap
-0.47
handels
-0.47
ulaski
-0.42
morgen
-0.41
knap
-0.41
datas
-0.41
herbe
-0.41
stille
-0.40
POSITIVE LOGITS
prison
1.12
Prison
1.12
prison
1.10
Prison
1.09
prisons
1.02
Prisoners
0.94
PRISON
0.92
prisoners
0.90
Prisons
0.88
prisoner
0.85
Activations Density 0.068%