INDEX
    Explanations

    The neuron activates on instructional or advisory language—phrases that present steps, tips, or guidance.

    New Auto-Interp
    Negative Logits
     Sr
    -0.07
     Daily
    -0.06
    ReLU
    -0.06
    ях
    -0.06
    .getAll
    -0.06
    .getBody
    -0.06
    edish
    -0.06
     activist
    -0.05
     beds
    -0.05
    .toastr
    -0.05
    POSITIVE LOGITS
     may
    0.07
     Watt
    0.07
    _TEMPLATE
    0.07
     příprav
    0.07
     Common
    0.07
    -Co
    0.07
     enticing
    0.06
    ्च
    0.06
     might
    0.06
     emo
    0.06
    Act Density 0.037%

    No Known Activations