INDEX
    Explanations

    mentions of being banned

    instances of the word "banned" across various contexts

    New Auto-Interp
    Negative Logits
    eah
    -0.80
    issance
    -0.73
     IMAGES
    -0.73
     Generations
    -0.69
    Auth
    -0.68
    sie
    -0.66
    rious
    -0.66
    ickey
    -0.64
    ilant
    -0.64
    prise
    -0.64
    POSITIVE LOGITS
    hee
    0.91
    ishment
    0.80
     banning
    0.77
    netflix
    0.76
     banned
    0.76
     bans
    0.73
     substances
    0.73
     smoking
    0.72
    hammer
    0.72
    wana
    0.71
    Act Density 0.019%

    No Known Activations