INDEX
    Explanations

    caring, family

    This neuron detects words expressing positive actions or endorsements (e.g., giving, recommending, buying, caring, generous).

    New Auto-Interp
    Negative Logits
    809
    -0.07
     rugs
    -0.07
    amik
    -0.06
    _today
    -0.06
    essel
    -0.06
     fluffy
    -0.06
     листь
    -0.06
    -0.06
     underwear
    -0.06
    _ASSERT
    -0.06
    POSITIVE LOGITS
     moral
    0.06
    -div
    0.06
     monet
    0.06
     tone
    0.06
     biblical
    0.06
     perv
    0.06
     imperson
    0.06
     spor
    0.06
     staple
    0.06
     PW
    0.06
    Act Density 0.356%

    No Known Activations