INDEX
Explanations
the neuron responds to the quoted instruction-prefix pattern (e.g. seeing double quotes around a persona name followed by a colon, as in “Bo i Bot:”) used in prompt-injection directives.
unfiltered, sexually explicit conversations.
New Auto-Interp
Negative Logits
cow
-0.06
pais
-0.06
Roman
-0.06
řez
-0.06
vascular
-0.06
END
-0.06
उप
-0.06
_p
-0.06
-rec
-0.06
Vend
-0.06
POSITIVE LOGITS
differed
0.08
ROLLER
0.07
differences
0.07
Sellers
0.07
kernels
0.07
trigger
0.06
Collaboration
0.06
stable
0.06
ARD
0.06
]*
0.06
Activations Density 0.005%