INDEX
    Explanations

    negative outcomes

    The neuron activates on words that signal risk, uncertainty, or potential negative outcomes (e.g. jeopardy, forced, may, risk).

    New Auto-Interp
    Negative Logits
    /mat
    -0.07
    -pills
    -0.07
    YE
    -0.06
    或者
    -0.06
     menn
    -0.06
    -0.06
    tty
    -0.06
     стандарт
    -0.06
    很多
    -0.06
    _Variable
    -0.06
    POSITIVE LOGITS
     drifted
    0.07
    xBD
    0.07
     disb
    0.06
    cntl
    0.06
    _REMOVE
    0.06
    0.06
    =obj
    0.06
     oppress
    0.06
    cripcion
    0.06
    0.06
    Act Density 0.044%

    No Known Activations