INDEX
Explanations
Code or writing formatting
This neuron responds to the special “jailbreak” prompt markers (e.g. the “[JAILBREAK]” token bracket) used in DAN‐style instructions.
New Auto-Interp
Negative Logits
-campus
-0.07
xt
-0.07
CircularProgress
-0.07
よく
-0.07
XD
-0.07
紅
-0.07
yaygın
-0.06
.AbsoluteConstraints
-0.06
XT
-0.06
racer
-0.06
POSITIVE LOGITS
=!
0.07
他们
0.07
(',')0.07
arkers
0.06
.avi
0.06
.Constants
0.06
Patterns
0.06
.examples
0.06
requisite
0.06
.eps
0.06
Activations Density 0.001%