INDEX
    Explanations

    The main thing this neuron does is spot words that describe physical harm or injury.

    New Auto-Interp
    Negative Logits
    -license
    -0.07
    简单
    -0.06
     android
    -0.06
    crear
    -0.06
    .Linear
    -0.06
    _set
    -0.06
     cả
    -0.06
     RAM
    -0.06
     Kara
    -0.06
     lovers
    -0.06
    POSITIVE LOGITS
     derivative
    0.07
     showcase
    0.07
    atical
    0.07
     تاریخی
    0.06
    asto
    0.06
    ,当
    0.06
     intelligence
    0.06
    ocrats
    0.06
    ical
    0.06
    ivative
    0.06
    Act Density 0.024%

    No Known Activations