INDEX
    Explanations

    incidents involving criminal activities or arrests.

    The neuron activates on words that report bodily harm or accident outcomes—especially terms like “injuries,” “injured,” and “casualties.”

    New Auto-Interp
    Negative Logits
    .identity
    -0.07
    ilenames
    -0.07
     Chỉ
    -0.06
    (Throwable
    -0.06
    、それ
    -0.06
    ・━・━
    -0.06
    -0.06
     customerId
    -0.06
    ปฏ
    -0.06
     Peaks
    -0.06
    POSITIVE LOGITS
     taxes
    0.07
    ::_
    0.07
     Zero
    0.07
    yms
    0.06
     intern
    0.06
    elow
    0.06
     allowance
    0.06
     Mag
    0.06
     reduction
    0.06
     methodologies
    0.06
    Act Density 0.041%

    No Known Activations