INDEX
    Explanations

    This neuron detects hedging or probability expressions—words like “likely,” “expected,” or “probably” that signal uncertainty or anticipation.

    New Auto-Interp
    Negative Logits
    ’ve
    -0.07
     Momentum
    -0.06
    dhcp
    -0.06
    /i
    -0.06
    Mock
    -0.06
     Rub
    -0.06
    OW
    -0.06
    rectangle
    -0.06
    ky
    -0.06
     jane
    -0.06
    POSITIVE LOGITS
    URED
    0.08
    ести
    0.07
     allied
    0.07
     ces
    0.06
     IA
    0.06
    	us
    0.06
     deport
    0.06
    。”↵↵
    0.06
     stains
    0.06
    oints
    0.06
    Act Density 0.052%

    No Known Activations