INDEX
    Explanations

    This neuron consistently activates on numeric tokens (especially decimal numbers and years).

    New Auto-Interp
    Negative Logits
    377
    -0.07
     JWT
    -0.06
    800
    -0.06
    ejména
    -0.06
    eroon
    -0.06
    гот
    -0.06
    Null
    -0.06
     hashtag
    -0.06
    -0.06
     SSP
    -0.06
    POSITIVE LOGITS
     threatens
    0.07
     Declarations
    0.06
     Spread
    0.06
     Music
    0.06
     Evil
    0.06
    -social
    0.06
    -esque
    0.06
    yz
    0.06
     Qualified
    0.06
    گیری
    0.06
    Act Density 0.028%

    No Known Activations