INDEX
    Explanations

    references to social media moderation and its implications

    New Auto-Interp
    Negative Logits
    leck
    -0.16
    ÑĤÑĢон
    -0.15
     Tablet
    -0.15
    reff
    -0.15
    tablet
    -0.15
     tablet
    -0.15
     Lair
    -0.14
    RSS
    -0.14
    |max
    -0.14
    sector
    -0.14
    POSITIVE LOGITS
     moderation
    0.31
     Moder
    0.26
     removal
    0.26
     moder
    0.25
     moderators
    0.25
     Removal
    0.23
    Moder
    0.23
     removed
    0.22
     removing
    0.22
     flags
    0.22
    Act Density 0.037%

    No Known Activations