INDEX
    Explanations

    danger or peril

    The neuron flags words that denote risk or endangerment (e.g., “jeopardize,” “risk”).

    New Auto-Interp
    Negative Logits
     tantra
    -0.06
     surveyed
    -0.06
     suburbs
    -0.06
     rectangles
    -0.06
     bride
    -0.06
     від
    -0.06
     hive
    -0.06
     fuse
    -0.06
    (layout
    -0.06
     ProgressBar
    -0.06
    POSITIVE LOGITS
     jeopard
    0.09
     peril
    0.08
    (dep
    0.07
    !↵
    0.07
     risking
    0.07
     LOC
    0.07
    )">↵
    0.07
    "',↵
    0.07
     GPUs
    0.06
    ImplOptions
    0.06
    Act Density 0.006%

    No Known Activations