INDEX
    Explanations

    The neuron activates on words expressing a promise or commitment (e.g. “promise”).

    New Auto-Interp
    Negative Logits
     Frog
    -0.07
     мала
    -0.06
     intend
    -0.06
     blanket
    -0.06
    Titulo
    -0.06
     nut
    -0.06
     supper
    -0.06
     juices
    -0.06
    .priority
    -0.06
    _unit
    -0.06
    POSITIVE LOGITS
    ycled
    0.07
    py
    0.07
     /**
    ↵
    0.07
     ${({
    0.06
    odia
    0.06
    ổng
    0.06
    onestly
    0.06
    !important
    0.06
     magical
    0.06
    än
    0.06
    Act Density 0.009%

    No Known Activations