INDEX
    Explanations

    instructions

    This neuron detects the command token “start,” i.e. user instructions that begin with the word “start.”

    New Auto-Interp
    Negative Logits
    (options
    -0.07
     deity
    -0.06
     rice
    -0.06
    انگ
    -0.06
    /layouts
    -0.06
     Oslo
    -0.06
     rock
    -0.05
     blo
    -0.05
     col
    -0.05
    	filter
    -0.05
    POSITIVE LOGITS
    имо
    0.07
     плат
    0.07
    porate
    0.07
     vandalism
    0.07
     amatør
    0.06
    구글상위
    0.06
    jedn
    0.06
    0.06
    -common
    0.06
    istant
    0.06
    Act Density 0.139%

    No Known Activations