INDEX
    Explanations

    The neuron activates on occurrences of the verb “pose” (and its variants like “poses”) indicating the presentation of a risk or threat.

    New Auto-Interp
    Negative Logits
     reclaim
    -0.08
    -third
    -0.07
     Handbook
    -0.07
    lett
    -0.07
     increments
    -0.07
    Atlantic
    -0.07
     Atlantic
    -0.07
     seventh
    -0.07
    -heart
    -0.06
    club
    -0.06
    POSITIVE LOGITS
     posing
    0.12
     posed
    0.11
     poses
    0.09
     pose
    0.09
    Pos
    0.08
     Pos
    0.07
     Pose
    0.07
    imestep
    0.07
    Pose
    0.07
     occured
    0.07
    Act Density 0.015%

    No Known Activations