INDEX
    Explanations

    references to spoilers in a discussion about TV shows

    New Auto-Interp
    Negative Logits
    istik
    -0.15
    enberg
    -0.14
    çĭ
    -0.14
    ffa
    -0.14
     consect
    -0.14
     flakes
    -0.14
    intro
    -0.14
    èµ
    -0.13
    aptcha
    -0.13
    apult
    -0.13
    POSITIVE LOGITS
     Spo
    0.59
     spoilers
    0.55
     spoiler
    0.53
     spoil
    0.53
    spo
    0.52
     spo
    0.50
    Spo
    0.50
     spoiled
    0.45
    Spoiler
    0.39
     spol
    0.31
    Act Density 0.036%

    No Known Activations