INDEX
    Explanations

    mentions of content removal and potential repercussions

    references to tweet removals or alterations

    New Auto-Interp
    Negative Logits
     kindred
    -0.68
     Growing
    -0.66
    htaking
    -0.66
     stereotype
    -0.62
     superpower
    -0.60
     marrying
    -0.60
    Growing
    -0.59
    inav
    -0.59
     distinguishes
    -0.59
     dominates
    -0.58
    POSITIVE LOGITS
     refund
    1.00
     screenshots
    0.86
     DMCA
    0.85
     refunds
    0.84
     deleted
    0.84
     apologised
    0.83
     retracted
    0.82
     apologies
    0.82
     redacted
    0.82
     reinstated
    0.79
    Act Density 1.586%

    No Known Activations