INDEX
    Explanations

    demonstrate

    This neuron activates on numeric score tokens (decimal numbers) representing model confidence or sentiment scores.

    New Auto-Interp
    Negative Logits
     deletion
    -0.08
     yazar
    -0.07
     cheered
    -0.07
     functionalities
    -0.07
     stellt
    -0.06
     sit
    -0.06
     Bank
    -0.06
     Mate
    -0.06
    ,",
    -0.06
     Laurie
    -0.06
    POSITIVE LOGITS
    .GL
    0.07
    0.06
    0.06
    أم
    0.06
    Le
    0.06
    Restart
    0.06
     čtvrt
    0.06
     currentPosition
    0.06
    toContain
    0.06
    ?>">↵
    0.06
    Act Density 0.038%

    No Known Activations