INDEX
    Explanations

    assassination

    The neuron activates on words related to assassination (variants of “assassinate,” “assassination,” etc.).

    New Auto-Interp
    Negative Logits
    Thor
    -0.07
    (RE
    -0.07
     behaved
    -0.06
    	Time
    -0.06
    movement
    -0.06
    (col
    -0.06
     thought
    -0.06
     plu
    -0.06
     lore
    -0.06
     forcefully
    -0.06
    POSITIVE LOGITS
     assassin
    0.12
     Assassin
    0.10
     assass
    0.10
     assassination
    0.10
    assin
    0.10
     Assass
    0.10
    าะห
    0.08
    0.07
    Attribute
    0.07
    axon
    0.07
    Act Density 0.003%

    No Known Activations