INDEX
    Explanations

    punctuation

    This neuron highlights numeric tokens—particularly the rating scale numbers (e.g. 1, 60, 80, 100) and decimal values—used in the score/evaluation prompts.

    New Auto-Interp
    Negative Logits
    -0.07
    78
    -0.07
     Ά
    -0.07
     sediment
    -0.07
     stripes
    -0.07
    	Grid
    -0.07
    -0.07
     worms
    -0.06
    Keith
    -0.06
     kills
    -0.06
    POSITIVE LOGITS
    andbox
    0.06
     Çin
    0.06
     Wak
    0.06
     기반
    0.06
    _um
    0.05
     envoy
    0.05
     USART
    0.05
     discriminate
    0.05
    ований
    0.05
    іння
    0.05
    Act Density 0.005%

    No Known Activations