INDEX
    Explanations

    Answers/comments

    This neuron activates on content that triggers the model’s refusal policy—detecting disallowed or unethical requests and the refusal phrases (e.g. “cannot,” “recommend,” “illegal,” “assist”) used to decline them.

    New Auto-Interp
    Negative Logits
    JSImport
    -0.06
    	actual
    -0.06
     راست
    -0.06
                                    
    -0.06
    .findElement
    -0.06
                                              
    -0.06
    linger
    -0.06
    кими
    -0.06
     τι
    -0.06
    -0.06
    POSITIVE LOGITS
     и
    0.07
    .GetText
    0.07
     rallying
    0.06
     Predictor
    0.06
     PropertyChanged
    0.06
     launch
    0.06
    =None
    0.06
    inter
    0.06
    \C
    0.06
     Txt
    0.06
    Act Density 0.059%

    No Known Activations