INDEX
Explanations
The neuron activates on words about capital punishment—especially “execute,” “death,” and “penalty.”
New Auto-Interp
Negative Logits
iating
-0.06
Brooklyn
-0.06
Jade
-0.06
Stride
-0.06
مسائل
-0.06
Tampa
-0.06
труб
-0.06
석
-0.06
ุณ
-0.06
kidneys
-0.06
POSITIVE LOGITS
cription
0.07
hearty
0.07
yet
0.07
stuff
0.06
Чтобы
0.06
then
0.06
är
0.06
town
0.06
"<?
0.06
cdr
0.06
Activations Density 0.001%