INDEX
    Explanations

    content generated or created by specific news organizations

    phrases related to news source attribution and content creation

    New Auto-Interp
    Negative Logits
     DeL
    -0.62
    escription
    -0.61
    cause
    -0.59
    nown
    -0.58
    essee
    -0.57
    iology
    -0.57
    uality
    -0.56
     memor
    -0.56
    ppo
    -0.56
     Examination
    -0.55
    POSITIVE LOGITS
     Sketch
    0.75
     Pastebin
    0.69
    acy
    0.68
    ACY
    0.67
     cookies
    0.66
     strives
    0.63
    Asset
    0.63
     contributors
    0.61
    avascript
    0.60
    roy
    0.60
    Act Density 0.059%

    No Known Activations