INDEX
    Explanations

    The neuron activates on self-referential stance phrases where the author expresses a goal of being positive or unbiased in their writing.

    New Auto-Interp
    Negative Logits
    _formatter
    -0.07
    _ps
    -0.07
    -0.06
    	K
    -0.06
     граф
    -0.06
     credible
    -0.06
    -0.06
     guarantee
    -0.06
    Upgrade
    -0.06
    -0.06
    POSITIVE LOGITS
    lder
    0.07
    ливість
    0.06
    (tokens
    0.06
    .validators
    0.06
    ,String
    0.06
    .csv
    0.06
     abbrev
    0.06
    .Css
    0.06
    ("""
    0.06
     Talking
    0.06
    Act Density 0.068%

    No Known Activations