INDEX
    Explanations

    mentions of specific formatting elements within a text, such as advertisement separators or story continuations

    instances of advertisement or promotional content

    New Auto-Interp
    Negative Logits
     homebrew
    -0.73
    addin
    -0.65
    EStream
    -0.64
    romy
    -0.62
    leans
    -0.61
     adjunct
    -0.60
    boro
    -0.60
    ktop
    -0.59
     estranged
    -0.56
    issance
    -0.54
    POSITIVE LOGITS
    ccording
    0.84
    JUST
    0.78
    SPONSORED
    0.72
    Spain
    0.71
    ATT
    0.67
    AIN
    0.67
    eria
    0.67
    TAG
    0.66
    Reward
    0.65
    Prev
    0.65
    Act Density 0.073%

    No Known Activations