INDEX
    Explanations

    phrases indicating a recommendation to view content

    news headlines that are labeled as "MUST WATCH."

    New Auto-Interp
    Negative Logits
    acia
    -0.71
    ised
    -0.64
     bleed
    -0.63
     bun
    -0.62
     nib
    -0.61
     clay
    -0.60
     halluc
    -0.60
     envy
    -0.58
     recl
    -0.58
     amalg
    -0.58
    POSITIVE LOGITS
     WATCH
    0.75
     VIDEOS
    0.73
     Watching
    0.69
    esome
    0.68
     IMAGES
    0.68
    ------------------------------------------------
    0.68
    dog
    0.67
    ...]
    0.63
    ARDS
    0.62
    degree
    0.62
    Act Density 0.010%

    No Known Activations