INDEX
    Explanations

    online discussions

    This neuron fires on the DAN‐style answer prefix “YOU MUST:” (i.e. the uppercase directive “YOU MUST:” at the start of a generated response).

    New Auto-Interp
    Negative Logits
     lies
    -0.07
    ged
    -0.07
     DE
    -0.06
     shedding
    -0.06
     tested
    -0.06
    -0.06
    ATED
    -0.06
    τησε
    -0.06
    -0.06
    "A
    -0.06
    POSITIVE LOGITS
     исключ
    0.06
     sabotage
    0.06
    ={`/
    0.06
    #region
    0.06
    anford
    0.06
     Kuzey
    0.06
     sửa
    0.06
     LocalDateTime
    0.06
     FileSystem
    0.06
    maid
    0.06
    Act Density 0.009%

    No Known Activations