INDEX
    Explanations

    This neuron essentially never activates except on the auxiliary verb “have,” indicating it’s detecting that specific word.

    New Auto-Interp
    Negative Logits
     renewable
    -0.07
     unsafe
    -0.07
    :expr
    -0.07
    >");↵
    -0.07
     punishable
    -0.06
    (mem
    -0.06
    -Speed
    -0.06
     Atomic
    -0.06
    emergency
    -0.06
     MPL
    -0.06
    POSITIVE LOGITS
     have
    0.10
     had
    0.07
    fusion
    0.07
     Στα
    0.07
     дома
    0.07
     khách
    0.07
    _gradients
    0.07
    ภายใน
    0.06
     seafood
    0.06
    have
    0.06
    Act Density 0.036%

    No Known Activations