INDEX
    Explanations

    mentions of bans on various topics or items

    references to bans or prohibitions

    New Auto-Interp
    Negative Logits
     Generations
    -0.78
     IMAGES
    -0.70
    rious
    -0.69
    lycer
    -0.66
     Temper
    -0.66
     Io
    -0.65
     Barg
    -0.63
    Sea
    -0.63
     Editors
    -0.63
     PROG
    -0.62
    POSITIVE LOGITS
    ishment
    1.27
    hammer
    1.09
    hee
    0.89
    nered
    0.87
    ishing
    0.86
    ish
    0.85
    jo
    0.82
    zai
    0.82
    tering
    0.82
    icip
    0.82
    Act Density 0.017%

    No Known Activations