INDEX
    Explanations

    The neuron fires on words that signal cooperative effort toward survival (e.g. “work,” “together,” “survive”).

    New Auto-Interp
    Negative Logits
     dese
    -0.08
     loc
    -0.07
    ordes
    -0.07
     igen
    -0.07
     tion
    -0.07
    .gt
    -0.07
     виступ
    -0.06
    _material
    -0.06
    isd
    -0.06
     produit
    -0.06
    POSITIVE LOGITS
    дат
    0.06
    (\'
    0.06
    Observable
    0.06
    _finish
    0.06
     арти
    0.06
    Leo
    0.06
    onestly
    0.06
     LEN
    0.06
     "..
    0.06
    _totals
    0.05
    Act Density 0.031%

    No Known Activations