INDEX
    Explanations

    The neuron fires on tokens that appear in polite, self-commitment phrases—especially in “I’ll do my best to help/assist you”–style offers of assistance.

    New Auto-Interp
    Negative Logits
    sexo
    -0.07
     lying
    -0.07
     قابل
    -0.07
     خی
    -0.07
    (pt
    -0.06
    _stderr
    -0.06
     vitro
    -0.06
     صح
    -0.06
     původ
    -0.06
    	while
    -0.06
    POSITIVE LOGITS
     Southwest
    0.07
    .toolStripButton
    0.06
    gerald
    0.06
     underestimated
    0.06
    PRI
    0.06
    istra
    0.06
     MOT
    0.06
    0.06
     NORTH
    0.06
     BASE
    0.06
    Act Density 0.007%

    No Known Activations