INDEX
    Explanations

    Declining to comment

    The neuron detects phrases in which the assistant is refusing a request (e.g. “I’m sorry but I cannot fulfill this request”).

    New Auto-Interp
    Negative Logits
    Associ
    -0.06
    限定
    -0.06
     thông
    -0.06
     filePath
    -0.06
    Spi
    -0.06
    日期
    -0.06
     Зак
    -0.06
    .goto
    -0.06
    -0.06
    char
    -0.06
    POSITIVE LOGITS
     Reviews
    0.07
     --↵
    0.07
    Outdoor
    0.06
    .done
    0.06
    Death
    0.06
     sacked
    0.06
    Measured
    0.06
    lene
    0.06
    ROME
    0.06
    突然
    0.06
    Act Density 0.022%

    No Known Activations