INDEX
    Explanations

    code/technical documents

    The neuron fires on the little floating-point score token (e.g. “3.82…”/“3.84…”) that prefixes the model’s “Yes”/“No” answer.

    New Auto-Interp
    Negative Logits
     FH
    -0.08
     Isle
    -0.08
    ▍▍
    -0.07
    湿
    -0.06
    .tt
    -0.06
     Stefan
    -0.06
    _fft
    -0.06
    -0.06
    _VENDOR
    -0.06
     Vapor
    -0.06
    POSITIVE LOGITS
    Remaining
    0.07
    да
    0.07
    obot
    0.06
    aes
    0.06
     carbohydrate
    0.06
    om
    0.06
     computation
    0.06
     versión
    0.06
    .reduce
    0.06
     expanded
    0.06
    Act Density 0.001%

    No Known Activations