INDEX
    Explanations

    This neuron activates on words that signal a missing action or error—namely negations and contrastive terms (e.g. “but,” “never”) that highlight things not being done.

    New Auto-Interp
    Negative Logits
    stras
    -0.06
    éra
    -0.06
    student
    -0.06
    Cart
    -0.06
    .var
    -0.06
    563
    -0.06
     noodles
    -0.06
    ให
    -0.06
    Patient
    -0.06
     Ala
    -0.06
    POSITIVE LOGITS
     Bri
    0.07
     presumed
    0.07
     लगभग
    0.07
     gri
    0.07
     предполаг
    0.07
    /File
    0.06
     khu
    0.06
     conceivable
    0.06
     후기
    0.06
     fh
    0.06
    Act Density 0.014%

    No Known Activations