INDEX
    Explanations

    This neuron specifically detects mentions of the word “life.”

    New Auto-Interp
    Negative Logits
    Tex
    -0.07
    -0.07
    -fashion
    -0.06
     Razor
    -0.06
    umann
    -0.06
    YELLOW
    -0.06
    -picture
    -0.06
    (render
    -0.06
    (force
    -0.06
    emplo
    -0.06
    POSITIVE LOGITS
     lives
    0.06
     life
    0.06
     Lands
    0.06
     xOffset
    0.06
     만들
    0.06
     heartfelt
    0.06
    0.06
    0.06
     simplify
    0.06
    들에게
    0.06
    Act Density 0.024%

    No Known Activations