INDEX
    Explanations

    mentions of bans or prohibitions

    New Auto-Interp
    Negative Logits
     IMAGES
    -0.70
     Generations
    -0.67
     Veter
    -0.65
    mberg
    -0.63
    eah
    -0.63
     Remastered
    -0.62
    lycer
    -0.61
    eon
    -0.61
    prise
    -0.61
    rious
    -0.59
    POSITIVE LOGITS
    ishment
    1.02
    zai
    0.83
    hee
    0.82
    hammer
    0.82
    tering
    0.81
    unal
    0.80
    ish
    0.77
     viol
    0.75
    ishing
    0.74
    idding
    0.73
    Act Density 0.660%

    No Known Activations