INDEX
    Explanations

    mentions of bans on specific activities or objects

    references to prohibitions or restrictions

    New Auto-Interp
    Negative Logits
     IMAGES
    -0.77
     Generations
    -0.76
    lycer
    -0.67
     Temper
    -0.67
     Barg
    -0.66
     Apostles
    -0.65
     Editors
    -0.65
    ¯¯
    -0.63
     Directions
    -0.63
    rious
    -0.62
    POSITIVE LOGITS
    ishment
    1.22
    hammer
    1.04
    hee
    0.96
    ishing
    0.93
    tering
    0.90
    nered
    0.90
    zai
    0.88
    ish
    0.87
    ished
    0.87
    icip
    0.84
    Act Density 0.026%

    No Known Activations