INDEX
    Explanations

    The neuron strongly activates on tokens related to torture, mutilation, and other forms of extreme violence.

    New Auto-Interp
    Negative Logits
    itudes
    -0.06
    -0.06
     reminiscent
    -0.06
     niž
    -0.06
     pragmatic
    -0.06
     फर
    -0.06
    ประเทศไทย
    -0.06
    .radians
    -0.06
    ільш
    -0.06
     azi
    -0.06
    POSITIVE LOGITS
     torture
    0.13
     tortured
    0.11
     Tort
    0.08
     Async
    0.07
     Palace
    0.07
     Whole
    0.07
    orta
    0.07
     Cort
    0.07
     Circuit
    0.07
     sous
    0.06
    Act Density 0.003%

    No Known Activations