INDEX
    Explanations

    The neuron is specialized in detecting the token “No” in the assistant’s responses.

    New Auto-Interp
    Negative Logits
    	put
    -0.06
     گرد
    -0.06
    .dir
    -0.06
    .COM
    -0.06
     explanations
    -0.06
     тим
    -0.06
     Ли
    -0.06
    fi
    -0.06
    _PUT
    -0.06
     Stefan
    -0.06
    POSITIVE LOGITS
     stimulate
    0.07
    ốc
    0.07
    getInt
    0.06
    _neurons
    0.06
     expanded
    0.06
    toDouble
    0.06
     vigorously
    0.06
    haar
    0.06
     ayında
    0.06
    acent
    0.06
    Act Density 0.010%

    No Known Activations