INDEX
    Explanations

    entities or brand names mentioned in texts

    references to streaming services and digital content

    New Auto-Interp
    Negative Logits
    Rated
    -0.95
    instead
    -0.79
    Reply
    -0.71
    Atl
    -0.71
    çİĭ
    -0.70
    ITH
    -0.67
    KEN
    -0.66
    }.
    -0.64
     ______
    -0.64
    OTH
    -0.64
    POSITIVE LOGITS
     caveats
    0.72
     disclaimer
    0.71
     arrests
    0.66
     announcements
    0.65
     nods
    0.64
     occasional
    0.64
     exceptions
    0.64
     additions
    0.64
     shuffle
    0.63
     perks
    0.62
    Act Density 0.316%

    No Known Activations