INDEX
    Explanations

    references to censorship and banned works

    New Auto-Interp
    Negative Logits
    889
    -0.15
    loff
    -0.15
    abyrin
    -0.15
     pev
    -0.15
    íħ
    -0.15
    @student
    -0.14
    igne
    -0.14
     Baghd
    -0.14
    uvw
    -0.14
     abandonment
    -0.14
    POSITIVE LOGITS
     censorship
    0.44
     censor
    0.43
     c
    0.34
    ensor
    0.33
    ensored
    0.31
     cen
    0.26
    ensors
    0.25
     banning
    0.23
     bans
    0.23
    ban
    0.23
    Act Density 0.075%

    No Known Activations