INDEX
    Explanations

    This neuron detects hedging or epistemic-modal language—words that signal opinion, belief, expectation, or uncertainty.

    New Auto-Interp
    Negative Logits
    already
    -0.07
    ся
    -0.07
    -0.07
    Less
    -0.06
    Once
    -0.06
     altına
    -0.06
    Also
    -0.06
    Meanwhile
    -0.06
     Meanwhile
    -0.06
     not
    -0.06
    POSITIVE LOGITS
     إلا
    0.07
    .sc
    0.07
    _SLEEP
    0.06
     GEN
    0.06
    as
    0.06
     TEMP
    0.06
    [T
    0.06
    -gen
    0.06
     Alejandro
    0.06
     Tan
    0.06
    Act Density 0.068%

    No Known Activations