INDEX
    Explanations

    spoiler warnings

    This neuron detects conditional content‐warning phrases that address the reader (e.g. “if you’re queasy or sensitive…”).

    New Auto-Interp
    Negative Logits
    .club
    -0.07
    GridLayout
    -0.07
     Format
    -0.07
    terminal
    -0.06
    (timestamp
    -0.06
    byte
    -0.06
    artists
    -0.06
    EAR
    -0.06
     fread
    -0.06
     biết
    -0.06
    POSITIVE LOGITS
     Drama
    0.06
     Magick
    0.06
     وغير
    0.06
     Велик
    0.06
     cigaret
    0.06
     maxValue
    0.06
    ashing
    0.06
    оян
    0.06
    rompt
    0.06
     sebou
    0.06
    Act Density 0.013%

    No Known Activations