INDEX
    Explanations

    punctuation

    The neuron selectively activates on subword tokens that include digits (or digit‐letter mixes), such as numbers, percentages, and chemical formulas.

    New Auto-Interp
    Negative Logits
    14
    -0.09
    22
    -0.08
    0
    -0.08
     soc
    -0.07
    574
    -0.07
     bus
    -0.07
    13
    -0.07
    ize
    -0.07
                  
    -0.07
     rig
    -0.07
    POSITIVE LOGITS
     overhe
    0.07
     greatest
    0.07
    .nextDouble
    0.07
    ++↵
    0.07
     双线
    0.07
    عات
    0.07
    子の
    0.07
    같은
    0.07
     trăm
    0.07
    дром
    0.07
    Act Density 0.233%

    No Known Activations