INDEX
    Explanations

    The neuron selectively activates on the word “cause” (including its inflected form “causes”).

    New Auto-Interp
    Negative Logits
     stepping
    -0.07
    <l
    -0.07
    Robert
    -0.07
     screenshots
    -0.07
     reimb
    -0.07
     popping
    -0.07
     Elliott
    -0.06
     membranes
    -0.06
    legt
    -0.06
    lazy
    -0.06
    POSITIVE LOGITS
     cause
    0.11
     causes
    0.09
    Cause
    0.08
    cause
    0.08
    Aw
    0.08
     Cause
    0.08
    _CUR
    0.07
    -half
    0.07
    callable
    0.07
    _CUSTOMER
    0.07
    Act Density 0.016%

    No Known Activations