INDEX
    Explanations

    character descriptions

    This neuron activates on internal metadata tokens (e.g. header or control code markers like `<|start_header_id|>`) rather than actual content words.

    New Auto-Interp
    Negative Logits
    	Field
    -0.07
    Honda
    -0.06
    hape
    -0.06
    _birth
    -0.06
    -mean
    -0.06
    _lit
    -0.06
     prosecutors
    -0.06
     Bos
    -0.06
     strdup
    -0.06
    不安
    -0.06
    POSITIVE LOGITS
     swirl
    0.07
     علاق
    0.07
     jadx
    0.06
    特色
    0.06
     accomplished
    0.06
     classe
    0.06
    ýn
    0.06
    -inspired
    0.06
    0.05
     denying
    0.05
    Act Density 0.024%

    No Known Activations