INDEX
    Explanations

    This neuron activates on polite request phrasing (e.g. “we request,” “we would like to,” “please,” “to …”).

    New Auto-Interp
    Negative Logits
     Gun
    -0.07
    Bush
    -0.07
     Bush
    -0.07
     rightfully
    -0.07
    unas
    -0.07
    anza
    -0.07
    AWS
    -0.07
    Pu
    -0.06
     Brennan
    -0.06
    igans
    -0.06
    POSITIVE LOGITS
    <::
    0.08
    0.07
    leş
    0.07
     möchten
    0.07
    :
    0.07
     Calgary
    0.07
    らい
    0.07
    された
    0.07
    0.07
     가지
    0.07
    Act Density 0.033%

    No Known Activations