INDEX
    Explanations

    This neuron remains inactive on ordinary text and appears tuned to detect a very specific rare token or formatting marker that isn’t present in these examples.

    New Auto-Interp
    Negative Logits
     Shaw
    -0.07
     Chin
    -0.07
     famous
    -0.07
     useful
    -0.07
    一些
    -0.07
    ็นว
    -0.07
     hot
    -0.07
     DataLoader
    -0.07
    .mobile
    -0.07
     challenging
    -0.07
    POSITIVE LOGITS
     regardless
    0.15
    Regardless
    0.11
     Regardless
    0.11
    ardless
    0.10
     irrespective
    0.09
     Leafs
    0.07
     heed
    0.07
     Yes
    0.07
    Jak
    0.07
     cps
    0.07
    Act Density 0.008%

    No Known Activations