INDEX
    Explanations

    The neuron activates on modal auxiliary verbs (especially “can” and “could”) that express ability or possibility.

    New Auto-Interp
    Negative Logits
    .false
    -0.08
    POST
    -0.08
    omidou
    -0.07
    -0.07
    -0.07
    _First
    -0.07
     RTE
    -0.07
    _Post
    -0.07
     Nicht
    -0.06
    Trou
    -0.06
    POSITIVE LOGITS
     can
    0.24
    can
    0.16
     Can
    0.16
     could
    0.15
    Can
    0.14
     CAN
    0.14
     couldn
    0.13
    -can
    0.13
    could
    0.13
    CAN
    0.12
    Act Density 0.389%

    No Known Activations