INDEX
    Explanations

    terms related to censorship

    terms related to censorship and its implications

    New Auto-Interp
    Negative Logits
    MER
    -0.68
     Finn
    -0.64
    WORK
    -0.63
     Dew
    -0.63
    NO
    -0.63
     Baker
    -0.62
     Davidson
    -0.62
     Eston
    -0.61
     Kush
    -0.61
    odka
    -0.60
    POSITIVE LOGITS
    orious
    1.25
    zers
    1.04
    oring
    0.98
    zer
    0.96
    orship
    0.95
    ors
    0.90
    asing
    0.89
    uses
    0.89
     cens
    0.84
    ussen
    0.84
    Act Density 0.042%

    No Known Activations