INDEX
    Explanations

    This neuron activates on the word “Taxonomy” (and its token pieces), i.e. labels or category headers indicating taxonomic classification.

    New Auto-Interp
    Negative Logits
     going
    -0.07
    kker
    -0.07
     bakeca
    -0.06
     Mär
    -0.06
    -0.06
    ?>
    -0.06
    348
    -0.06
    <A
    -0.06
     drops
    -0.06
    ORN
    -0.06
    POSITIVE LOGITS
    ****************
    0.07
     ButterKnife
    0.07
    formerly
    0.06
     accumulating
    0.06
     Tahoe
    0.06
     επί
    0.06
    attach
    0.06
     vengeance
    0.06
     bullied
    0.06
    utenant
    0.06
    Act Density 0.001%

    No Known Activations