INDEX
    Explanations

    The neuron activates on apology or regret phrases (e.g. “I apologize,” “sorry,” “not helpful,” etc.).

    New Auto-Interp
    Negative Logits
     σε
    -0.06
    -vars
    -0.06
    _walk
    -0.06
     finds
    -0.06
    -pres
    -0.06
    amodel
    -0.06
     Pharmacy
    -0.06
    )section
    -0.06
    ольно
    -0.06
     sends
    -0.05
    POSITIVE LOGITS
    ститут
    0.07
    OrUpdate
    0.06
     привы
    0.06
    iento
    0.06
     hữu
    0.06
    863
    0.06
    ुओ
    0.06
     تقو
    0.06
    230
    0.06
     [$
    0.06
    Act Density 0.004%

    No Known Activations