INDEX
    Explanations

    The neuron fires on legal “release” terminology—words like release, releases, discharges, and related settlement‐agreement language.

    New Auto-Interp
    Negative Logits
    plode
    -0.06
     counterfeit
    -0.06
    -esteem
    -0.06
    igate
    -0.06
    KeyPressed
    -0.06
     effort
    -0.06
    Her
    -0.06
     hammer
    -0.06
    Tuesday
    -0.06
     Moodle
    -0.06
    POSITIVE LOGITS
    디시
    0.07
    ือก
    0.07
    hes
    0.07
    0.07
     Ran
    0.06
    ιαν
    0.06
     suspension
    0.06
     actions
    0.06
    _delivery
    0.06
    üş
    0.06
    Act Density 0.003%

    No Known Activations