INDEX
    Explanations

    The neuron activates on words expressing the disabling or prevention of a feature (e.g. “disable,” “prevent”).

    New Auto-Interp
    Negative Logits
     ст
    -0.06
     Auss
    -0.06
    plants
    -0.06
    -0.06
    	attack
    -0.06
    415
    -0.05
     Schiff
    -0.05
     який
    -0.05
    (levels
    -0.05
    uni
    -0.05
    POSITIVE LOGITS
    _container
    0.07
     düşünc
    0.07
    ционного
    0.07
    .Security
    0.07
     прош
    0.07
     nomin
    0.07
    FILENAME
    0.07
     button
    0.07
     mortar
    0.07
    \Notifications
    0.06
    Act Density 0.039%

    No Known Activations