INDEX
    Explanations

    The neuron specifically activates on mentions of the “token” reward‐and‐penalty system—that is, lines like “You have 10 tokens to start,” “Each time you reject a question… 5 tokens will be deducted,” and similar token‐count instructions.

    New Auto-Interp
    Negative Logits
     Album
    -0.07
    UR
    -0.07
     ott
    -0.06
    leaning
    -0.06
     poetic
    -0.06
     practicing
    -0.06
     tok
    -0.06
     Owens
    -0.06
     стра
    -0.06
    ircles
    -0.06
    POSITIVE LOGITS
    /init
    0.07
    _NAV
    0.06
    .wait
    0.06
    _DP
    0.06
     mocked
    0.06
    "Don
    0.06
    ayan
    0.06
    0.06
     publication
    0.06
    -option
    0.06
    Act Density 0.002%

    No Known Activations