INDEX
    Explanations

    phrases and words related to vandalism or acts of destruction.

    The neuron activates on occurrences of the root “vandal,” i.e. words referring to vandalism (e.g. “vandalism,” “vandalized,” etc.).

    New Auto-Interp
    Negative Logits
    ...↵↵↵↵
    -0.07
    middleware
    -0.06
    _STOP
    -0.06
     trouver
    -0.06
    
    -0.06
    ěř
    -0.06
     nový
    -0.06
    OUSE
    -0.06
     mời
    -0.05
    irect
    -0.05
    POSITIVE LOGITS
     vandal
    0.11
     vandalism
    0.10
     graffiti
    0.08
     mutil
    0.08
     DIY
    0.07
     kid
    0.07
     القدم
    0.07
    Ze
    0.07
     GIF
    0.07
    188
    0.07
    Act Density 0.002%

    No Known Activations