INDEX
Explanations
online discussions
This neuron fires on the DAN‐style answer prefix “YOU MUST:” (i.e. the uppercase directive “YOU MUST:” at the start of a generated response).
New Auto-Interp
Negative Logits
lies
-0.07
ged
-0.07
DE
-0.06
shedding
-0.06
tested
-0.06
血
-0.06
ATED
-0.06
τησε
-0.06
A
-0.06
"A
-0.06
POSITIVE LOGITS
исключ
0.06
sabotage
0.06
={`/0.06
#region
0.06
anford
0.06
Kuzey
0.06
sửa
0.06
LocalDateTime
0.06
FileSystem
0.06
maid
0.06
Activations Density 0.009%